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ABSTRACT 



Matching (e.g., via correlation or similarity process) entities 
having attributes, some of which have associated values. 
The values of the attributes may be adjusted based on 
number of entities that have values for a particular attribute 
so that the values decrease as the number increases. The 
attributes of the entities may be harmonized and provided 
with default values so that entities being matched have 
common attributes defined by the union of the attributes of 
the entities being matched. The attributes of the entities may 
be expanded and provided with default values so that the 
entities being matched have attributes that neither had 
originally. Match values may be normalized to provide a 
weight value which may be used to predict an attribute value 
of a new entity based on known attribute values of known 
entities. The weight values may be tuned such that relatively 
high weights are amplified and relatively low weights are 
suppressed. 
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METHODS AND APPARATUS, USING 
EXPANSION ATTRIBUTES HAVING 
DEFAULT, VALUES, FOR MATCHING 
ENTITIES AND PREDICTING AN 
ATTRIBUTE OF AN ENTITY 

BACKGROUND OF THE INVENTION 

a. Field of the Invention 

The present invention concerns matching entities based 
on attributes of the entities and predicting an attribute of an 
entity based on attributes of the entity and other entities. 
More specifically, the present invention concerns "collabo- 
rative filtering" which may be used, for example, to suggest 
content of interest to a client entity on a network (e.g., the 
Internet). 

b. Related Art 

In the past five to ten years, computers have become 
interconnected by networks to an ever increasing extent via 
the Internet. The proliferation of networks, in conjunction 
with the increased availability of inexpensive data storage 
means, has afforded computer users unprecedented access to 
a wealth of data. Unfortunately, however, the very vastness 
of available data can overwhelm a user. Desired data can 
become difficult to find and search heuristics employed to 
locate desired data often return unwanted data. 

Various concepts have been employed to help users locate 
desired data. In the context of the Internet for example, some 
services have organized content based on a hierarchy of 
categories. A user may then navigate through a series of 
hierarchical menus to find content that may be of interest to 
them. An example of such a service is the YAHOO™ World 
Wide Web site on the Internet. Unfortunately, content, in the 
form of Internet "web sites" for example, must be organized 
by the service and users must navigate through menus. If a 
user mistakenly believes that a category will be of interest or 
include what they were looking for, but the category turns 
out to be irrelevant, the user must backtrack through one or 
more hierarchical levels of categories. Moreover, such ser- 
vices which provide hierarchical menus of categories are 
passive. That is, a user must actively navigate through the 
hierarchical menus of categories. 

Again in the context of the Internet for example, some 
services provide "search engines" which search database 
content or "web sites" pursuant to a user query. In response 
to a user's query, a rank ordered list, which includes brief 
descriptions of the uncovered content, as well as hypertext 
links (text, having associated Internet address information, 
which, when activated, commands a computer to retrieve 
content from the associated Internet address) to the uncov- 
ered content is returned. The rank ordering of the list is 
typically based on a match between words appearing in the 
query and words appearing in the content. Unfortunately, 
however, present limitations of search heuristics often cause 
irrelevant content to be returned in response to a query. 
Again, unfortunately, the very wealth of available content 
impairs the efficacy of these search engines since it is 
difficult to separate irrelevant content from relevant content. 
Moreover, as was the case with services which provide 
hierarchical menus of categories, search engines are passive. 
That is, a user must actively submit a query. 

The two above-mentioned content search concepts are 
categorized as "pull" processes because the user must 
explicitly direct these processes to find the content and pull 
it to them (i.e., to their computer). 

In view of the drawbacks of the above discussed data 
location concepts, "collaborative filtering" systems have 
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been developed. Collaborative filtering systems predict the 
preferences of a user based on known attributes of the user, 
as well as known attributes of other users. Some collabora- 
tive filtering systems require that a user fill out a survey of 

5 his(her) interests and use the submitted survey as a query. 
Hence, such collaborative filtering systems may be classified 
as "pull" processes. Other collaborative filtering systems are 
categorized as "push" processes because they use content 
previously "consumed" (e.g., requested, downloaded, 

10 rendered, etc.) by a user to proactively predict content which 
may appeal to that user. Such collaborative filtering systems 
then present (or "push") the content, or information identi- 
fying the content, to the user. 

Basically, collaborative filtering uses known attributes 

15 (e-g-, explicitly entered votes) of a new user (referred to as 
"the active case") and known attributes of other users to 
predict values of unknown attributes of the new user (e.g., 
attributes not yet entered by the new user). The mean vote 
(vl) for an entity may be defined as: 

20 

PJ = — > V; ; 
mi*-* t,J 

25 where 

V^-aA value of attribute j of entity i. Typically, an integer 
value. 

maThe number of attributes (e.g., in a database). 
30 I,=A set of attribute indexes for which entity i has known 
values (e.g., based on an explicitly entered vote). For 
example, I 2 ={3,4} means that entity 2 has values (e.g., 
has voted) for attributes 3 and 4. 
M ( VThe number of attributes for which entity i has known 
35 values — the number of elements in I { .. 

Denoting parameters for the active case (i.e., new entity) 
with subscript a, a prediction p flJ - of active case attribute 
values (e.g., votes) for attributes without known values (i.e., 
attributes not in I fl ) can be defined as: 

40 

Po,j = V a + £ (v,j - Vj)Wa.i M 

45 n=The number of entities (e.g., in a database). 

W fl l =The estimated weight (or alternatively match) 

between entity i and entity a. 
p iV sThe predicted value of attribute j of entity i. 
Hence, a predicted attribute value (e.g., vote) is calculated 

50 from a weighted sum of the attribute values (e.g., votes) of 
each other user. The appearance of mean values in the 
formula merely serves to express values in terms of devia- 
tion from the mean value (i.e., defines a reference) and has 
no other significant impact. 

55 An example of a proposed collaborative filtering system 
is discussed in the article, Resnick et al., "Group Lens: An 
Open Architecture for Collaborative Filtering of Netnews," 
Proceedings of the Association for Computer Machinery 
1994 Conference on Computer Supported Cooperative 

60 Work, Chapel Hill, N.C., pp. 175-186 (1994) (hereafter 
referred to as "the Resnick article"). In the system discussed 
in the Resnick article (hereafter referred to as "the Grou- 
plens system"), users rate articles which they have read. 
Rating servers, called Better Bit Bureaus, gather and dis- 

65 seminate the ratings. More specifically, the Better Bit 
Bureaus package one or more ratings into a news article. The 
rating servers predict scores based on a heuristic that people 
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who agreed in the past will probably agree again. More recommendations. The present invention can operate with 

specifically, the GroupLens system first correlates ratings to many types of entities (e.g., a computer, a local area 

determine the similarity of a user's ratings with the ratings network, a business, etc.) and many different attributes (e.g., 

of other users. Correlation coefficients or weights between age, sex, weight, favorite food, television shows viewed, 

-1 and 1 are computed and indicate how much a particular 5 money spent on movie rentals, etc.). 

user tended to agree with other users. The GroupLens j n a second way of enhancing a process for matching 

system then predicts how much the user will like a new entities in accordance with the present invention, a match of 

article based on ratings from similar users. More entities is based on a union of all attributes for which either 

specifically, the ratings of the other users are weighted based entity has a value, rather than only attributes for which both 

on the correlation coefficients determined above and the 10 entities have values. This aspect of the present invention is 

weighted ratings are combined to form a prediction. referred to below as "attribute set harmonization". Default 

Unfortunately, the GroupLens system has a number of values may be provided to attributes without values. Such 

problems. First, users must explicitly enter ratings. Some values may be referred to as "default" or "implicit" values, 

users find it difficult to judge articles or other content. In this fa a ^ 8lffl Qf ^ { 

regard, it is expected that predictions made by the Grou- 15 invention improves a process for matching entities by deter- 

pLens system will improve as correlation or weight deter- mifli a match of entities based on aUributes for which 

minations improve. It is further expected that the correlation neUher emj faas yalues Referfed {Q as « altri5me 

and weight determinations made by the GroupUns system expansion >', this aspect of the present invention is useful 

will improve as more ratings are entered. Unfortunately, when most can be {Q have m& felevance 

many users may become frustrated by poor predictions 20 tQ mQSt tjm ^ M with « atui5me sel harm0 nization", 

and/or with entenng ratings before enough ratings are gath- def ^ valu£S be 

provided to attributes without values, 
ered to make the correlation and weight determmations , _ , . . 
made by the GroupLens system good. Thus, the GroupLens . In a wa / of chancing a process for collaborative filtering 
system has a bootstrapping problem. Many users will m accordance with yet another aspect of the present 
become frustrated with the predictions made by the Grou- 25 ^ention, the weights used for collaborative filtering pre- 
plans system, due, in part, to an initial scarcity of ratings. ? lctl °" [hc A basic collaborative filtering prediction 
As a result of user frustration with initially poor predictions, f ormula df^sed above) are tuned to emphasize relatively 
such users may stop entering ratings. If this occurs, the hl S h wei f ht u s and to suppress relatively low weights. This 
predictions made by the GroupLens system will probably a fP ect ° f *f P^sent invention is referred to below as 
not improve because users will not provide it with enough 30 tunm S or amplification . 
ratings information. A tuned weight W may be defined as: 

Moreover, the correlation strategy used in the GroupLens ^ 
system apparently does not consider the distinctness of the 

ratings. For example, the fact that two users might like a ^—^h*) if w< q 

popular article is apparently not weighted less than the fact 35 

that two users might like an very unpopular article. where pl=A tunable parameter, typically greater than 1. 
Furthermore, the GroupLens system apparently does not The aspects of the present invention may be used in 
consider non-data, or the absence of ratings by users. content push systems, such as certain collaborative filter 
Thus, improved content location methods and apparatus systems for example, so that burdens on a user are mini- 
are needed. Since burdens formerly placed on the entity 40 mized. Also, the aspects of the present invention may 
(e.g., a computer user) should be eliminated to the extent operate on a distributed environment, such as a networked 
possible, such methods and apparatus (i) should be useable environment including clients and servers. More 
in content push systems, such as collaborative filtering specifically, in the context of the Internet, the present inven- 
systems for example, and (ii) should use entity attributes tion may be employed to predict Internet locations or 
which may be explicitly and/or implicitly determined. Since 45 content that may be of interest to an end user. The present 
the content should be only the most relevant or most likely invention may operate (a) locally, at a client, (b) remotely, at 
to be of interest to the entity, such methods and apparatus a server, or (c) at a location remote from the client and the 
should accurately match entities based on attributes of the server (e.g., a separate server). Finally, particular functions 
entities and should accurately predict attributes of (e.g., of the present invention may be distributed at one or more 
content of interest to) an entity based on attributes of the 50 of the above devices (or locations), 
entity and other entities. Finally, the methods and apparatus 

should be able to operate on a distributed environment, such BRIEF DESCRIPTION OF THE DRAWINGS 

as a networked environment including clients and servers. FIG. 1A is a table which illustrates a data structure of 

SUMMARY OF THE INVENTION records for organizing data which may be used by methods 

55 and apparatus of the present invention. 

The present invention provides various ways to enhance . c , . , . 

ci i ' c i t , FIG. IB is a table of exemplary data which is used to 

collaborative filtering processes, for example, by enhancing ... t . , . f u • . . c 

c . l ' X4 'a n a 4 illustrate the problems of the prior art and the operation or 

processes for matching entities. More specifically, a first , * . , * 4 . * 

„..„ n ^„u nrt ^-„„ n „ f nr m ,^ui„„ •„ „„™ the methods and apparatus of the present invention, 

way or enhancing a process for matching entities in accor- rr r 

dance with the present invention is to adjust the value (V, j) 60 Fia 2 15 a hl S h lcvcl block dia S ram of an illustrative 

of an attribute of an entity based on the number of entities embodiment of an apparatus of the present invention, 

that have a value for (e.g., have voted on) that particular FIGS. 3A-C is a high level flow diagram of an illustrative 

attribute. In general, the value (V, v ) of the attribute of an process which implements a method of the present inven- 

entity will be decreased as the number of entities that have ti° n - 

a value of the attribute increases. Referred to as "inverse 65 FIGS. 4A-B is a flow diagram of an illustrative attribute 

attribute frequency", this aspect of the present invention (or content) adjustment process used in the process of FIG. 

lowers the effect of more common attributes in generating 3. 
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FIG. 5 is a flow diagram of an illustrative entity matching 
process used in the process of FIG. 3. 

FIG. 6 is a flow diagram of an illustrative attribute or 
desired content prediction process used in the process of 
FIG. 3. 5 

FIGS. 7Aand 7B are tables which illustrate intermediate 
values which result when the illustrative attribute (or 
content) adjustment process of FIGS. 3 and 4 operates on the 
exemplary data of FIG. IB. 

FIG. 8 is a table which illustrates intermediate values 
which result when the illustrative entity matching process of 
FIGS. 3 and 5 operates on the exemplary data of FIG. IB. 

FIG. 9 is a table which illustrates intermediate and final 
values which result when the illustrative attribute (or desired 15 
content) prediction process of FIGS. 3 and 6 operates on the 
exemplary data of FIG. IB. 

FIG. 10 is a high level block diagram of an a network 
environment in which the methods and apparatus of the 
present invention may operate. 20 

FIG. 11 illustrates an exemplary graphical user interface 
which may be used in the network environment of FIG. 10. 

FIG. 12 is a process diagram which illustrates the opera- 
tions of methods and apparatus of the present invention in 
the network environment of FIG. 10. 25 

FIG. 13 A is a data flow diagram which illustrates the 
initial provision of entity identification to a client on the 
Internet. 

FIG. 13B is a data flow diagram which illustrates how a 30 
database of entity/attributes records is built by an Internet 
server. 

FIG. 13C is a data flow diagram which illustrates the 
operation of the present invention as implemented on the 
Internet. 35 

FIG. 14 is a high level block diagram of an apparatus for 
implementing the present invention. 

FIG. 15 is a high level flow diagram of an alternative 
collaborative filtering process. 

FIGS. 16 A through 16 G are flow diagrams which illus- 
trate the processes carried out by the alternative collabora- 
tive filtering process illustrated in FIG. 15. 

FIGS. 17 A through 17K illustrate an exemplary input, 
intermediate results and an output of data processed by the 45 
alternative collaborative filtering process illustrated in FIG. 
15. 

FIG. 18 is a high level flow diagram of correlation 
determination and attribute ranking process performed in 
accordance with the present invention. 50 

FIG. 19A depicts exemplary input records used for illus- 
trating steps of the processes of FIG. 18. 

FIG. 19B depicts intermediate record data which illus- 
trates the record attribute harmonization and default attribute 
value addition steps of FIG. 18. 55 

FIG. 19C depicts intermediate record data which illus- 
trates the record expansion and default attribute value addi- 
tion steps of FIG. 18. 

DETAILED DESCRIPTION 60 

Definitions 

In the following, the term "data" may include information 
representing text (e.g., a word processor document), audio, 
images, pictures (e.g., a JPEG file), video (e.g., an MPEG-2 
file), or any other information which, by itself, or as output 65 
by appropriate means, can be sensed. Such data may also be 
referred to as "content." 
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The term "entity" may include a computer, a computer 
user, or a logical grouping of people such as people working 
at a particular company, people sharing a particular local 
area network (or "LAN"), a network browser, an inter- 
network browser, etc. An entity may include certain 
"attributes". If, for example, an "entity" is a computer user, 
that user may have "attributes" such as age, sex, education 
level, residence, a computer network address, food 
preferences, movie preferences, etc. 

An attribute may reflect a set which the entity is either in 
or not in (e.g., "Male?", "over 60 years old?", "rented a 
video tape within the last week?", etc.) so that the attribute 
value is a qualitative binary (e.g., YES/NO, ON/OFF, 
TRUE/FALSE) value. Such an attribute may be referred to 
as a "qualitative attribute". On the other hand, an attribute 
may reflect a set of values so that the attribute value is a 
value within the set (e.g., "57 years" is a value within an 
"age" set, "red" is a value within a "favorite color" set, "7" 
is a value within a "number of video tapes rented in march" 
set, etc.). Such an attribute may be referred to as a "value 
attribute". "Value attributes" may be expressed as a group 
"qualitative attributes" associated with each value of a set 
(e.g., "rented 1 video tape?", "rented 2 video tapes?", etc.). 

The present invention concerns novel methods and appa- 
ratus for matching entities based on attributes of the entities 
and for predicting an attribute of (or content of interest to) 
an entity based on attributes of the entity and other entities. 
The following description is presented to enable one skilled 
in the art to make and use the invention, and is provided in 
the context of a particular application and its requirements. 
Various modifications to the illustrated embodiment(s) will 
be apparent to those skilled in the art, and the general 
principles set forth below may be applied to other embodi- 
ments and applications. Thus, the present invention is not 
intended to be limited to the embodiments) shown. 

The present invention provides various ways to enhance 
collaborative filtering processes, for example, by enhancing 
processes for matching entities. More specifically, a first 
way of enhancing a process for matching entities in accor- 
dance with the present invention is to adjust the value (V (j/ ) 
of an attribute of an entity based on the number of entities 
that have a value for (e.g., have voted on) that particular 
attribute. In general, the value (V,y) of the attribute of an 
entity will be decreased as the number of entities that have 
a value of the attribute increases. Referred to below as 
"inverse attribute frequency", this aspect of the present 
invention lowers the effect of more common attributes in 
generating recommendations. 

In a second way of enhancing a process for matching 
entities in accordance with the present invention, a match of 
entities is based on a union of all attributes for which either 
entity has a value, rather than only attributes for which both 
entities have values. This aspect of the present invention is 
referred to below as "attribute set harmonization". Default 
values may be provided to attributes without values. 

In a similar concept, still another aspect of the present 
invention improves a process for matching entities by deter- 
mining a match of entities based on attributes for which 
neither entity has values. Referred to below as "attribute 
expansion", this aspect of the present invention is useful 
when most attributes can be assumed to have little relevance 
to most entities. As with "attribute set harmonization", 
default values may be provided to attributes without values. 

In a way of enhancing a process for collaborative filtering 
in accordance with yet another aspect of the present 
invention, the weights used for collaborative filtering pre- 
diction (e.g., the basic collaborative filtering prediction 
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formula discussed above) are tuned to emphasize relatively 
high weights and to suppress relatively low weights. This 
aspect of the present invention is referred to below as 
"tuning" or "amplification". 

In the following, first the basic functions and general 5 
concepts of the various aspects of the present invention will 
be disclosed. Next, a generic apparatus for practicing the 
present invention (with reference to FIGS. 2 and 14) will be 
disclosed. Thereafter, the operation of the inverse attribute 
frequency aspect of the present invention with respect to a 10 
simple matching process (with reference to FIGS. 3 through 
6) will be disclosed and an example of the operation (with 
reference to FIGS. 3 through 9) will be provided. Next, the 
operation of the inverse attribute frequency aspect of the 
present invention with respect to a collaborative filtering 15 
processes which uses a similarity-type matching process will 
be disclosed (with reference to FIGS. 15 and 16 A through 
16G) and an example of the operation of the collaborative 
filtering process (with reference to FIGS. 17A through 17K) 
will be provided. Thereafter, the operation of the inverse 20 
attribute frequency, attribute set harmonization (with default 
values), attribute expansion (with default values), and tuning 
aspects of the present invention with respect to a collabo- 
rative filtering process which uses a correlation matching 
processes will be disclosed with reference to FIGS. 18 and 25 
19A through 19C. Then, an architecture for practicing the 
present invention on a network having a client and a server 
(with reference to FIGS. 10 and 11) will be disclosed. 
Finally, the processing and data flow within the networked 
architecture (with reference to FIGS. 10-13C) will be dis- 30 
closed. 

§1 Functions Performed by the Methods and Apparatus 

The present invention operates to better "match" entities 
based on attributes of the entities and/or to better predict an 
attribute of (e.g., content of interest to) an entity based on 35 
attributes of the entity and other entities. 
§1.1 Inverse Attribute Frequency Matchings 
Conceptually, the inverse attribute frequency aspect of the 
present invention discounts more common entity attributes 
and emphasizes more rare entity attributes when matching 40 
entities and when predicting unknown attribute values. For 
example, suppose that the entities are television viewers and 
the relevant attribute is their favorite television program. 
Many television viewers watch a sporting event such as the 
SuperBowl™ on the FOX network, while many fewer 45 
television viewers watch a sporting event such as the NCAA 
Division III Women's Field Hockey Finals on the ESPN 
network. Thus, the present invention is based, in part, on the 
assumption that two viewers who watched the field hockey 
finals "match" to a greater degree than two viewers who 50 
watched the SuperBowl™. The present invention is based, 
in part, on the further assumption that the two viewers who 
watched the field hockey finals may have other common 
television viewing preferences, and in any event, more so 
than two viewers who watched the SuperBowl™. 55 

In the above example, the entities were television viewers 
and the attribute was favorite television program. The 
present invention can operate with many types of entities 
(e.g., a computer, a local area network, a business, etc.) and 
many different attributes (e.g., age, sex, weight, favorite 60 
food, television shows viewed, money spent on movie 
rentals, etc.). 

A similar concept has been employed in document 
retrieval systems so that larger weights are assigned to terms 
that occur frequently in particular documents, but rarely in 65 
other documents, since such terms can distinguish the docu- 
ments in which they occur from other documents. (See, e.g., 
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the article: Salton et al., "Automatic Structuring and 
Retrieval of Large Text Files," Communications of the ACM, 
Vol. 37, No. 2, pp. 97-108 (February 1994) (hereafter 
referred to as "the Salton article").) This weighting is used 
either (a) to retrieve a document in response to an explicitly 
entered query, or (b) to determine the similarity of docu- 
ments. This concept, however, has not been used in collabo- 
rative filtering applications to predict attributes of one entity 
(or content which the entity might want) based on other 
entities. That is, the words in one document are not used to 
predict other words that the document might contain based 
on the words found in documents with similar words; such 
an operation would be absurd in the document retrieval 
process. Moreover, the documents are limited to one 
attribute type; namely words or terras. Furthermore, the 
words found in a document are inherently and objectively 
based on the content of the document itself; they are not 
implicitly determined or subjectively determined. 

§1.2 Attribute Set Harmonization (with Default Values) 
for Use with Matching Processes 

When determining a degree to which two (2) entities 
"match", known matching and collaborative filtering pro- 
cesses consider only attributes for which both of the entities 
have values. For example, in the GroupLens system dis- 
cussed above, only documents on which both the active user 
and another user have voted are considered. Thus, if \ x 
represents a set of documents that a first user has voted on 
and I 2 represents a set of documents that second user has 
voted on, when determining a degree to which the first and 
second users "match", only the values of documents of a set 
defined by an intersection of documents voted on by each 
user (i.e., IjHQ are considered. 

The inventors of the present invention have found that in 
certain applications, harmonizing the sets of the attributes 
considered in a matching process, such that they include all 
attributes for which either of the entities have values, 
provides a better match determination, particularly for pur- 
poses of collaborative filtering to predict content of interest 
to an Internet computer user. Thus, in accordance with the 
attribute set harmonization aspect of the present invention, 
when determining a degree to which the first and second 
entities "match", the values of attributes of a set defined by 
a union of attributes, having values, of each entity (i.e., 
IjUlJ are considered. A default value may then be assigned 
to attributes without values, of an entity. Such values may be 
referred to as "default" or "implicit" values. 

§1.3 Attribute Expansion (with Default Values) for Use 
with Matching Processes 

When determining a degree to which two (2) entities 
"match", the inventors of the present invention have further 
found that, in certain applications, attributes for which 
neither of the two (2) entities have values should be con- 
sidered. Like the aspect of attribute set harmonization dis- 
cussed above, attribute expansion may assign default values 
to the attributes, having no value, of an entity. In an 
application in which the entities are computer users and the 
attributes are content requested, downloaded or rendered by 
such users, this aspect of the present invention was found to 
improve matching determinations, and collaborative 
filtering, particularly when it can be assumed that most users 
will not want to request, download or render most content. 

§1.4 Tuning 

Recall that a basic collaborative filtering method pre- 
dicted attributes of an active (or new) entity based, in part, 
on an estimated weight between the active entity and other 
entities. The present inventors have found that it is useful to 
tune such weights by emphasizing relatively high weights 
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and suppressing relatively low weights. As discussed, a 
tuned weight W may be defined as: 

wW if w^O 

w'—i-w**) if w<0 (3) 

where 

=A tunable parameter, typically greater than 1. 

The aspects of the present invention may be used in 
content push systems, such as certain collaborative filter 
systems for example, so that burdens on a user are mini- 10 
mized. Also, the aspects of the present invention may 
operate on a distributed environment, such as a networked 
environment including clients and servers. More 
specifically, in the context of the Internet, the present inven- 
tion may be employed to predict Internet locations or 15 
content that may be of interest to an end user. The present 
invention may operate (a) locally, at a client, (b) remotely, at 
a server, or (c) at a location remote from the client and the 
server (e.g., a separate server). Finally, particular functions 
of the present invention may be distributed at one or more 20 
of the above devices (or locations). 
§2 High Level Structure of an Apparatus (FIG. 2) 

FIG. 2 is a high level block diagram of an illustrative 
embodiment of an apparatus 200 of the present invention. 
The apparatus 200 includes an improved collaborative filter 25 
(or an improved entity match determination) engine 202. As 
alluded to above, the improved collaborative filter (or entity 
match determination) engine 202 operates to accurately 
determine the degree to which entities "match" based on 
attributes of the entities and/or to accurately predict 30 
attributes of (e.g., content of interest to) an entity based on 
attributes of the entity and of other entities. 

The improved collaborative filter (or entity match 
determination) engine 202 may employ the above described 
inverse attribute frequency aspect of the present invention to 35 
discount more common entity attributes and emphasize 
more rare entity attributes. Further, the improved collabo- 
rative filter (or entity match determination) engine 202 may 
employ the attribute set harmonization (with default values), 
attribute expansion (with default values) and/or tuning 40 
aspects of the present invention described above. 

Various structures for embodying the improved collabo- 
rative filter (or entity match determination) engine 202 will 
become apparent to those skilled in the art, particularly after 
reviewing details of the operations performed by the 45 
improved collaborative filter (or entity match determination) 
engine 202 discussed in detail below. In any event, referring 
to FIG. 14, the improved collaborative filter (or entity match 
determination) engine 202 may include a computer based 
system 1400 including a processor(s) 1402, an input/output 50 
interface(s) 1404, and a storage device(s) 1406, all of which 
may share a system bus 1408. The storage device(s) 1406 
may include a program storage device or area 1410 for 
storing a program which, when executed by the processor 
1402, effects the operations described below. The input/ 55 
output interface(s) 1404 permits the computer based system 
1400 to accept data or commands from an input device(s) 
1412 (e.g., a keyboard, mouse, and/or other man-machine 
interface) and to provide data or commands to an output 
device(s) 1414 (e.g., a video display unit, and/or an audio 60 
unit). 

Referring again to FIG. 2, the improved collaborative 
filter (or entity match determination) engine 202 may com- 
municate with a user interface (e.g., the input device(s) 1412 
and output device(s) 1414) 204. 65 

In this way, the improved collaborative filter (or entity 
match determination) engine 202 may, for example, accept 



entity attributes and provide suggestions. The improved 
collaborative filter (or entity match determination) engine 
202 may also communicate with an entity/attribute data 
record generator/retriever 206. The entity/attribute data 
record generator/retriever 206 may (i) accept entity/attribute 
information from the improved collaborative filter (or entity 
match determination) engine (or alternatively, directly from 
the user interface 204 or an external source), (ii) form and/or 
update entity/attribute record(s) and (iii) store such records 
to the storage device 208. The entity/attribute data record 
generator/retriever 206 may also (i) retrieve entity/attribute 
records from the storage device 208, and (ii) forward the 
retrieved records to the improved collaborative filter (or 
entity match determination) engine. 

The system 200 does not require its own entity/attribute 
record generator 206. In this case, the entity/attribute data 
record generator/retriever will not generate entity/attribute 
records, but rather, will merely retrieve entity/attribute data 
records from a pre-existing database. That is, a third party 
may sell entity/attribute data records or sell access thereto. 
§3 Operation of Apparatus/Methods 

§3.1 First Illustrative Embodiment (FIGS. 3-9)— Inverse 
Attribute Frequency and Attribute Set Harmonization (with 
Default Voting) Applied to a Simple Entity Match Determi- 
nation Process and to a Collaborative Filter Process 

Referring now to FIGS. 3A-3C (the correct alignment of 
the drawing sheets for these figures being shown in FIG. 3), 
the operation of a simple entity match determination process 
and a collaborative filter process, each of which employ the 
inverse attribute frequency aspect of the present invention, 
will be disclosed. More specifically, FIGS. 3A-3C are 
collectively a high level flow diagram of an illustrative 
process 300 for determining an attribute value of a new 
entity, which employs the inverse attribute frequency aspect 
of the present invention. In the following, it will be assumed 
that "content" is an "attribute", and the terms "content" and 
"attribute" may be used interchangeably for purposes of 
disclosing the present invention. 

The following description assumes a preexisting entity/ 
attribute database. Referring to FIG. 1A, recall that an entity 
(e.g., a computer user, television viewer, business group, 
etc.) 106 has one or more associated attributes (e.g., age, sex, 
favorite television show, favorite food, network topology, 
industry group, gross sales, etc.) 104. Thus, an entity/ 
attribute database 100 may be formed of entity/attribute(s) 
records 102. Note that the database 100 may be a so-called 
"sparse" database in that many fields of the entity/attribute 
records 102 may be blank (e.g., a particular attribute (e.g., 
age) of a particular entity (e.g., a computer user) may not be 
known). In one embodiment of the process 300, record fields 
with unknown data are left blank (i.e., no attribute set 
harmonization). In an alternative embodiment of the present 
invention, blank record fields are filled in for example, with 
default attribute values (i.e., attribute set harmonization with 
default values). 

Referring back to FIG. 3 A, a process to determine 
whether a new entity possess an attribute (or wants content) 
300 first adjusts all attribute value(s) based on attribute 
frequency information. Initially, as shown in FIG. 3 A, all 
attribute values are adjusted in accordance with the inverse 
attribute frequency aspect of the present invention. First, as 
shown in step 302, an attribute or content count loop is 
initialized. More specifically, an attribute counter (ATTR__ 
COUNT) is set to 1 and an attribute count terminator 
(END_ATTR__COUNT) is set to the number (e.g., N in 
FIG. 1A) of attributes. Next, as shown in steps 304, 306, and 
308, for each attribute, the attribute value(s) are adjusted 
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based on the frequency of the attribute(s) throughout all of 
the entities. The step 304 is described in greater detail later 
with reference to FIG. 4. As shown in step 306, after an 
attribute is adjusted for all entities, a new attribute is 
processed (ATTR_COUNT-ATTR_COUNT+l). As 
shown in step 308, subsequent attributes are adjusted until 
all of the attributes are processed (ATTR_COUNT>END_ 
ATTR_COUNT). After each of the attributes is adjusted, 
processing continues, via node 310, to step 312. 

Next, referring to FIG. 3B, a match between a new (or 
active) entity and other entities is determined. Processing 
continues, via node 310, and as shown in step 312, entity 
loop values are initialized. More specifically, an entity 
counter (ENTITY_COUNT) is set to 1 and an entity count 
terminator (END_ENTITY_CT) is set to the number (see, 
e.g., M of FIG. 1A) of entity records 102 in the entity/ 
attribute database 100. Next, as shown in steps 314, 316, and 
318, for entity record 102, the degree to which a new entity 
"matches" each existing entity is determined based on the 
adjusted attributes of each. The step 314 is described in 
greater detail later with reference to FIG. 5. As shown in step 
316, after the degree to which the new entity matches an 
existing entity is determined, a degree to which the new 
entity matches a next entity is determined (ENTTTY_ 
COUNT=ENTlTY„COUNT+l). As shown in step 318, the 
degrees to which other subsequent entities match the new 
entity are determined until the degree of match between the 
new entity and a last entity has been determined (ENTITY_ 
COUNT>END_ENTITY_CT). After the degree of match 
between the new entity and each of the entities has been 
determined, processing continues, via node 320, to step 322. 

Referring now to FIG. 3C, values of attributes, not having 
a value, of the new entity are determined. Processing 
continues, via node 320, and as shown in step 322, attribute 
loop values are initialized. More specifically, an attribute 
counter (ATTR_COUNT) is set to 1 and an attribute count 
terminator (END__ATTR_COUNT) is set to the number 
(see, e.g., N of FIG. 1A) of attributes 104 for an entity 106 
of the entity/attribute database 100. Next, as shown in steps 
324, 326, and 328, for an each attribute 104, it is determined 
(predicted) whether the new entity will have the attribute. 
Alternatively, for each attribute 104, the attribute value of 
the new entity is predicted. If the attribute is content, this 
determination will predict whether or not (or the degree to 
which) the new entity will want the content. The step 324 is 45 
described in greater detail later with reference to FIG. 6. As 
shown in step 326, after it is determined (predicted) whether 
the new entity has the attribute (or desires the content), a 
next attribute is processed to determine whether the new 
entity has the attribute (or wants the content) (ATTR_ 
COUNT«ATTR_COUNT+l). As shown in step 328, the 
determination (prediction) of whether or not the new entity 
has an attribute (or wants content) continues for each 
subsequent attribute until the last attribute is processed 
(ATTR_COUNT>END_ATTR_COUNT). Thereafter, 
processing ends at node 330. 

FIGS. 4A and 4B are collectively a flow diagram of a part 
304 of an illustrative attribute (or content) adjustment pro- 
cess used in the process 300 of FIG. 3; the correct alignment 
of the drawing sheets for FIGS. 4A and 4B is shown in FIG. 
4. First, as shown in step 402 of FIG. 4A, an attribute 
frequency value and an entity count are initialized. More 
specifically the attribute frequency value (ATTR_FREQ) is 
set to 0, the entity count (ENTITY_COUNT) is set to 1, and 
the entity count terminator (END_ENTITY__CT) is set to 
the number (e.g., M in FIG. 1A) of entities 106. Next, as 
shown in steps 404, 406, and 408, the frequency of an 
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attribute over all entities is determined. As shown in step 
404, a running sum of the attribute frequency value (ATTR__ 
FREQ) is formed by adding new attribute values 
(ATTRIBUTE^ ttr—^ountxntity—joo unt) ■ Other ways of 
determining an adjusted attribute value are also possible. 
Basically, the adjusted attribute value is a function of the 
attribute value and the attribute frequency, and the adjusted 
attribute value decreases as the attribute frequency 
increases. This summing is most appropriate for "qualitative 
attributes" but may also be appropriate for "value attributes" 
having numerical values. Alternatively, a running sum of the 
attribute frequency value (ATTR_FREQ) may be formed by 
incrementing the attribute frequency value when the entity 
has a value, e.g., a value above a certain threshold or a 
certain value, for the attribute. As shown in steps 406 and 
408, this running sum is maintained as the attribute values 
of subsequent entities (ENTITY_COUNT=ENTITY_ 
COUNT+1) are added until the last entity (ENTITY_ 
COUNT>END_ENTITY_CT). Processing continues, via 
node 409, at step 410. 

Then, as shown in steps 410, 412, 414, 416, and 418 of 
FIG. 4B, an adjusted attribute is determined for each 
attribute of each entity by dividing the attribute value by the 
attribute frequency. More specifically, step 410 re-initializes 
an entity count (ENTITY__COUNT=l) as was the case in 
step 402 (but the attribute frequency is not re-initialized). 
Next, as shown in step 412, an adjusted attribute value for 
each entity (AD JU STE D_ATTR I B UTE ATTR _ co UNT 
ENTiTY_ coum ) is determined by dividing the attribute value 
for that entity ( ATTRI B UTE^ TTR _ co unt^ntjty—co unt) 
by the attribute frequency for the attribute over all entities 
(ATTR_JFREQ ATTR _ <:ouNT ). Other ways of determining an 
adjusted attribute value are also possible. Basically, the 
adjusted attribute value is a function of the attribute value 
and the attribute frequency, and the adjusted attribute value 
decreases as the attribute frequency increases. As shown in 
steps 414 and 416, which correspond to steps 406 and 408, 
respectively, adjusted attributes are determined for an 
attribute for each entity. Then, as shown at node 418, 
processing continues (e.g., at step 306). Since the process 
304 of FIGS. 4A-4C is nested in a loop through attributes 
(recall steps 302, 304, 306 and 308), adjusted attributes for 
all entities and all attributes are determined. 

FIG. 5 is a flow diagram of a part 314 of an illustrative 
entity match determiuation process used in the process of 
FIGS. 3A-3C. First, as shown in step 502, an attribute 
counter and an entity match value are initialized. More 
specifically, the attribute counter (ATTR_COUNT) is set to 
1, a attribute count terminator (END_ATTR__COUNT) is 
set to the number (e.g., N of FIG. 1A) of attributes, and a 
match value between a new entity and an entity 106 
(MATCl{ E tfj T y_ count) IS set 10 26 ro (0)- Next, if, as shown 
in step 504, for a given attribute, that attribute of the new 
entity (ATTRI B UTE^^^^ unt/je w— entity) "matches" 
the attribute of a particular entity (ATTRI BVT£ ATTR _ 

countpnttty-count)* men > ^ shown in ste P 506 > the 
match between the new entity and the particular entity 
(M^C^EtrrrrY-~<:ouNT) k incremented by the adjusted 
attribute count for the particular attribute and entity 

(ADJUSTED ^rmBVTE Ar ^ ou ^ Nr/TY ^ OUNT y 

"Attribute matching" (not to be confused with the degree to 
which entities match) is easily determined if the attribute at 
issue is a YES/NO attribute. However, attributes may be 
considered to "match" if both are within a certain numerical 
range (e.g., age 40 to 49) or logical grouping (e.g., favorite 
pastimes of football, baseball, hockey, and tennis may be 
grouped as "sports" or tennis, squash, and racquetball may 
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be grouped as "racket sports")- If, on the other hand, the user visited an HTML page which has information regarding 

attributes do not "match", as shown in steps 504 and 508, the the computer software company's e-mail product, attribute 3 

match value is not changed, and processing continues for 154c can correspond to whether or not a computer user 

subsequent attributes. visited an HTML page which has information regarding the 

The above steps are performed for all subsequent 5 computer software company's spreadsheet product, and 

attributes as shown by steps 508 and 510. More specifically, attribute 4 lS4d can correspond to whether or not a computer 

step 508 looks at the next attribute of the particular entity user visited an HTML page which has information regarding 

(ATTR_COUNT-Al'lK_COUNT+l). Step 510 ends this the computer software company's word processing product, 

section of the processing when the last attribute of the Referring to FIG. IB, a first entity (user J 156a has 

particular entity is considered (ATTR_COUNT>END__ 10 attribute 2 1546 and attribute 4 154a\ That is, the first entity 

AiTK_CT). Finally, as shown at node 512, processing (userj) 156a has visited the e-mail and word processing 

continues (e.g., at step 316). Since the process 314 of FIG. HTML pages of the computer software company's Internet 

5 is nested in a loop through entities (recall steps 312, 314, site within the last week. The second, third, and fourth 

316 and 318), matches between the new entity and each of entities (users^) 1566-1564 each has attribute 4 154a\ That 

the entities are determined. 15 is, these entities (users^) 1566-156d have visited the word 

FIG. 6 is a flow diagram of a part 324 of an illustrative processing HTML page of the computer software compa- 

attribute (or desired content) prediction process used in the ny's Internet site within the last week. A fifth entity (user 3 ) 

process 300 of FIGS. 3A-3C. First, as shown in step 602, an 156e has attribute 2 1546. That is, the fifth entity (user 5 ) 156e 

entity counter and a prediction of whether the new entity has visited the e-mail HTML page of the computer software 

includes an attribute (or wants content) are initialized. More 20 company's Internet site within the last week. Finally, the 

specifically, the entity counter (ENTITY_COUNT) is set to sixth entity (user 6 ) 156/ has attribute! 154a and attribute 3 

1, an entity count terminator (END_ENTITY_CT) is set to 154c. That is, the sixth entity (user 6 ) 156/ has visited the tax 

the number (see, e.g., M of FIG. 1A) of entities, and a preparation and spreadsheet HTML pages of the computer 

prediction value of whether the new entity includes an software company's Internet site within the last week, 

attribute (or wants content) {^^£^ A jtr~count) is set to zero 25 Tn e new user 166 enters the Internet site of the computer 

(0). Next, as shown in steps 604, 606 and 608, a prediction software company and visits the spreadsheet and word 

is formed for whether the new entity will have an attribute processor HTML pages. As will become apparent in the 

(or want content) based on adjusted attribute values and following discussion, the present invention can predict other 

match values between the new entity and all other entities. HTML pages of the computer software company's Internet 

More specifically, as shown in steps 604 and 606, a running 30 site that the new user 166 may want to visit. As will be 

sum of a prediction is maintained by adding the product of discussed later, these predictions may be presented to the 

a match value between the new entity and a particular entity new user 166 in the form of suggested sites. 

(MATCH^^^o^) and an adjusted attribute of the Recall from steps 402, 404, 406, and 408 of FIG. 4A, that 

entity (ADJUSTED_AITRIBUTE A7T ^^ 0 ^^^ iVT/7T _ an attribute frequency is determined for each attribute. As 

couxt) while the entities are incremented. As shown in steps 35 shown in FIG. 7A, attribute-! 154a has a frequency of one (1) 

606 and 608, this continues until the prediction is updated across all entities because only user 6 15 6/ has visited the tax 

based on (i) the match value between the new entity and the preparation HTML page of the computer software compa- 

last entity and (ii) the adjusted attribute of the last entity ny's Internet site within the last week. Attribute^ 1546 has 

(EISmTY_COUNT>END_ENTITY__CT). a frequency of two (2) across all entities because user 2 156a 

After the prediction is updated based on the match value 40 and user s 156e have visited the e-mail HTML page of the 

between the new entity and the last entity and based on the computer software company's Internet site within the last 

adjusted attribute of the last entity (ENTITY^ week. Attribute 3 154c has a frequency of one (1) across all 

COUNT>END_ENTITY_CT), as shown at node 610, pro- entities because only user 6 156/ has visited the spreadsheet 

cessing continues (e.g., at step 326). Since the process 324 HTML page of the computer software company's Internet 

of FIG. 6 is nested in a loop through attributes (recall steps 45 site within the last week. Finally, attribute 4 156d has a 

322, 324, 326 and 328), predictions for all attributes (or frequency of 4 across all entities because usersj_ 4 

desired content) for the new entity are determined. \56a-lS6d have all visited the word processor HTML page 

Though the process 324 of FIG. 6 was described as of the computer software company's Internet site within the 

predicting whether or not an entity has an attribute, it may last week. 

also be used to predict the value of attributes of the entity. 50 Recall from steps 410, 412, 414, and 416 of FIG. 4B that 

A simple example is set forth below to further teach the all attributes are adjusted by dividing them by their attribute 

operation of the present invention. Referring first to FIG. IB, frequency value. Thus, as shown in FIG. 7B, the value of 

a database 150 includes entity/attribute records 152 for six attribute 3 154a is divided by one (1) for each entity, the 

(6) entities 156. Each record includes four (4) attributes 154. value of attribute 2 1546 is divided by two (2) for each entity, 

Anew entity 166 is represented by record 162, and has four 55 the value of attribute 3 154c is divided by one (1) for each 

(4) attributes 164 as well. This example may serve to entity, and the value of atlribute 4 154d is divided by four (4) 

illustrate the use of the present invention to suggest Internet for each entity. As a result, the values of attributes! an4 3 

pages authored in the hypertext mark-up language (or 154« and 154c remain at one (1) at the associated entities, 

"HTML pages 5 ') within an Internet site to a computer user the values of attribute 2 1546 have a value of 0.5 at the 

visiting the Internet site. For example, suppose that the 60 associated entities, and the values of attribute 4 154d have a 

attributes 154 correspond to whether or not a computer user value of 0.25 at the associated entities, 

has visited a particular HTML page of the Internet site of a Recall from steps 502, 504, 506, 508, and 510 of FIG. 5 

computer software company within the past week. Attributej that the degree to which each entity 156 and the new entity 

154a can correspond to whether or not a computer user 166 match is based on a sum of adjusted attributes for 

visited an HTML page which has information regarding the 65 "matching" attributes. Referring now to FIG. IB, the new 

computer software company's tax preparation product, user 166 has visited the HTML pages of the computer 

atlribute 2 1546 can correspond to whether or not a computer software company's Internet site related to spreadsheets 



12/14/04, EAST Version: 2.0.1.4 



US 6,345,264 Bl 



15 



154c and word processing 154*1 Thus, the new user 166 will 
match, to some degree, users^ and user 6 since these users 
have also visited the HTML page of the computer software 
company's Internet site related to spreadsheets 154c or the 
HTML page of the computer software company's Internet 
site related to word processors 154d. As shown in FIG. 8, the 
total match value between the new user 166 and user 6 156/ 
is one (1) because there is a "match" at attribute 3 and the 
adjusted attribute value for attribute 3 is one (1). The total 
match value between the new user 166 and each of users 2 _ 4 
156a-156d is 0.25 because there is a "match", in each case, 
at attribute 4 and the adjusted attribute value for attribute 4 is 
0.25. Since there are no "matches" between the new user 166 
and user 5 156c, the total match value in this case is zero (0). 

Finally, recall from steps 602, 604, 606, and 608 of FIG. 
6 that a prediction that the new user 166 will have a 
particular attribute 154 (or want to visit a particular HTML 
page) is determined by summing products of entity match 
values (see, e.g., FIG. 8) and adjusted attributes (see, e.g., 
FIG. 7B) for all entities 156. Referring now to FIG. 9, the 
adjusted entries correspond to the adjusted attribute entries 
found in FIG. 7B multiplied by the total match value found 
in FIG. 8. A prediction record 900 of the new user 166 is 
presented as the summed products of match value and 
adjusted attributes over all entities 156. Since the new user 
166 already had attributes 3 and 4 — that is, the new user 166 
has already visited the spreadsheet HTML page 154c and the 
word processing HTML page 154d of the computer software 
company's Internet site — those predictions are ignored. 

Interestingly, the present invention predicts that the new 
user is eight (8) times more likely to want to visit the tax 
preparation HTML page 154a of the computer software 
company's Internet site (prediction value of 1) than the 
e-mail HTML page 154/? of the computer software compa- 
ny's Internet site (prediction value of 0.125). Referring back 
to FIG. IB, this is because the HTML spreadsheet page 154c 
of the computer software company's Internet site that the 
new user 166 had in common with user 6 was, in general, 
over all users 156, less popular that the HTML word 
processing page 154d of the computer software company's 
Internet site that the new user 166 had in common with 
userSj^. In the above example, one can speculate that many 
different users use word processing applications while 
mostly accountants use spreadsheet and tax preparation 
applications. To reiterate, the present invention weights 
distinct attributes more heavily when determining similari- 
ties between entities, as was illustrated in the foregoing 
example. 

§3.2 Second Illustrative Embodiment — Similarity -Type 
Matching Process and Collaborative Filtering Process Using 
(FIGS. 15-17) the Inverse Attribute Frequency Aspect of the 
Present Invention. 

Similarity-type matching and collaborative filtering appli- 
cations which employ the inverse attribute frequency aspect 
of the present invention will be described with reference to 
FIG. 15. Thereafter, the operation of the collaborative fil- 
tering application (and its underlying similarity-type match- 
ing process) on exemplary data will be illustrated with 
reference to FIGS. 16A through 16G, and 17A through 17K. 

In a similarity-type matching process for determining a 
similarity between two entities, each entity may be thought 
of as a vector in z-dimensional space (where z is a number 
of attributes) and the similarity between the two entities may 
be thought of as a ("cosine-product" or "dot-product") 
distance between the two vectors. First, the values (V^) of 
the attributes are normalized to provide normalized attribute 
values (q^) as follows: 
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The similarity between the first and second entities can be 
defined as: 



10 



J 



(5) 



where: 

15 v^aThe value of the j* attribute of entity i. 

q lV =The normalized value of the ) th attribute of entity i. 
1,-sThe set of attribute indexes, having values, for entity i. 
A normalized weight value (w 12 ) can be determined from 
the similarity values as follows: 



20 



(6) 



FIG. 15 is a high level flow diagram of the collaborative 
filtering process 1500 (and underlying similarity-type 
matching process) which employs the inverse attribute fre- 
quency aspect of the present invention. In the following 
discussion, it is assumed that certain entities having certain 
attributes, and a new entity having certain attributes, are 
available for processing. Certain preliminary steps are first 
described with reference to steps 1502 and 1504. First, as 
shown in step 1502 entity records are harmonized, if 
necessary, to provide common attribute fields. Referring 
back to FIG. IB, for example, userj only visited the E-mail 
web page and the Word Processor web page while user 6 only 
visited the Tax Preparer web page and the Spreadsheet web 
page. In a database, the records for these users would be 
harmonized so that they include all four web page attribute 
fields. Next, as shown in step 1504 shown in FIG. 15, the 
number of total entities and the number of entities having an 
attribute value (or an attribute value above a threshold) for 
an attribute field are determined. The details of this deter- 
mination are discussed in more detail with reference to FIG. 
16A below. 

Once the preliminary steps are complete, an inverse 
attribute frequency (or "IAF") value is determined for each 
attribute as shown in step 1506 shown in FIG. 15. The IAF 
value for an attribute is defined as the natural log (LN) of the 
result of the total number of entities divided by the number 
of entities having an attribute value in an attribute field 
(excluding default values). This step 1506 is described in 
more detail with reference to FIG. 16B below. Other inverse 
attribute functions, based on the number of entities having 
an attribute value in an attribute field, and which decrease as 
the number of entities having an attribute value in an 
attribute field increase, may also be used. 

Next, as shown in optional step 1508, in FIG. 15, default 
attributes are added. More specifically, if an entity does not 
have a value in an attribute field, a default value may be 
added. 

Thereafter, as shown in step 1510, the entity records are 
transformed based on the IAF values to provide adjusted 
attribute values (e.g., V^-V^ log (n/n y ) where n is the total 
number of entities and n ; - is the number of entities having a 
value for attribute,), and the sum of squares of the adjusted 
attribute values 
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is determined for each entity. More specifically, the adjusted 
attribute values are determined by multiplying the original 
attribute values by the IAF value for the particular attribute. 
This step 1510 is described in more detail with reference to 
FIG. 16C below. 

The new entity having certain attributes is now processed. 
Since the collaborative filtering process 1500 predicts the 
existence or values of certain attributes of the new entity, the 
new entity may be referred to as a "query". As shown in step 
1512 shown in FIG. 15, the new entity (or query) is 
transformed based on the IAF values. That is, each attribute 
of the new entity is multiplied by the corresponding IAF 
value. Next, as shown in step 1514, the sum of the squares 
of the adjusted attribute values of the new entity is also 
determined. The square root of the sum of squares is then 
determined. These steps 1512 and 1514 are described in 
more detail with reference to FIG. 16D below. 

As shown in step 1516 shown in FIG. 15, normalized 
entity attribute values of the new entity are then determined 
by dividing the adjusted attributes of the new entity by the 
square root of the sum of squares of the attributes of the new 
entity. 



This step 1516 is described in more detail with reference to 
FIG. 16E below. 

Next, as shown in step 1518 in FIG. 15, similarities 
between the new entity and each of the existing entities are 
determined (e.g., S^^-Zq^^-'q,. ') and case weights for 
each of the entities are determined 

More specifically, a similarity value is determined as a sum 
of products of all adjusted attribute values for an entity and 
the normalized attribute values of the new entity. The case 
weight for an entity is defined as the similarity for that entity 
divided by the sum of squares of adjusted attribute values for 
that entity. Next, as shown in step 1520, a total case weight 
value is determined. The total case weight value is defined 
as the sum of the case weights for all of the entities. These 
steps 1518 and 1520 are described in more detail with 
reference to FIG. 16F below. 

Next, as shown in step 1522 in FIG. 15, new attribute 
values for each entity are determined based on the original 
attribute values and the case weights for each entity. More 
specifically, a new attribute value is defined as the product 
of the original attribute value and the case weight of the 
entity. The new attribute values are summed over all entities, 
as shown in FIG. 1524, to determine a total. Finally, as 
shown in step 1528, predicted attributes for the new entity 
are determined based on the total for the attribute and the 
total case weight. More specifically, the predicted attribute is 
defined as the total of that attribute divided by the total case 
weight. These steps 1522, 1524, and 1526 are described in 
more detail with reference to FIG. 16G below. 
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Having had described the operation of the collaborative 
filtering process 1500 of FIG. 15 at a high level, the 
operation of this process 1500 on exemplary entity records 
and a new entity is described with reference to FIGS. 16 A 

5 through 16 G, and 17 A through 17K. In the example set forth 
below, it is assumed that the entities are users, that the 
attributes are movies, and the attribute values are explicitly 
entered ratings between 0 and 5. Naturally, the entities and 
attribute values could represent other things. For example, 
the attribute values could be the number of times a user 
visited an Internet HTML page within the last week. FIG. 
17A is a table which illustrates the attribute values for 
attributes Dl through D6 of entities (users) CI through C6. 
If no attribute value is known, a question mark is 
depicted. For example, in the context of movie ratings, user 

15 C4 did not see, or saw but did not vote on, movies D2 and 
D5. FIG. 17E depicts the attribute values of attributes of a 
new entity (user). 

Recall from step 1502 of FIG. 15, that the entity records 
are harmonized, if necessary, to provide common attribute 

20 fields. In the context of the new user of FIG. 17E, attribute 
fields for attributes D4, D5, and D6 are added. 

Referring to FIGS. 16A, 17A, and 17B, recall from step 
1504 of FIG. 15 that the number of total entities and the 
number of entities having an attribute value for an attribute 

25 field are determined. More specifically, as shown in step 
1604 of FIG. 16 A, processing loops for processing each 
entity and each attribute are initialized. Referring to the table 
of FIG. 17A, an attribute count will loop from 1 to 6 and an 
entity count will also loop from 1 to 6. As shown in step 

30 1606, a total entities with attribute value (TOTAL_ 
ENTITI ES_W„ATTR_ VALUE) is initialized for a given 
attribute by setting it to zero (0). Next, as shown in steps 
1608, 1610, and 1612, the total entities with attribute value 
for an attribute is determined over all entities by increment- 

35 ing the value by an attribute value (ATTRIBUTE_VOTE) 
value. The attribute vote value (ATTRIBUTE_VOTE) is 
defined to be one (1) if the attribute of the entity has an 
attribute value and zero (0) if the attribute of the entity does 
not have an attribute value (i.e., "?"). As shown in steps 1614 

40 and 1616, a total entities with attribute value (TOTAL__ 
ENTITI ES_W_ATTR_ VALUE) is determined for each 
attribute. 

The resulting total entities with attribute value (TOTAL_ 
ENTITI ES_W_ATTR_JVALUE) for each attribute are 

45 depicted in FIG. 17B. For example, since entities (users) CI 
and C3 through C6 have an attribute value (e.g., a vote) for 
attribute (e.g., movie) Dl, the total entities with attribute 
value for Dl is 5. The total entities with attribute values for 
the other attributes are similarly determined as can be seen 

50 by reviewing FIGS. 17 A and 17B. 

Referring to FIGS. 16B, 17B, and 17C, recall from step 
1506 of FIG. 15 that the an inverse attribute frequency (IAF) 
value is determined for each attribute. As shown in step 1622 
of FIG. 16B, an attribute count loop is initialized to count 

55 through attributes 1 through 6. As shown in step 1624, an 
inverse attribute frequency value (IAF) for a particular 
attribute is determined by taking the natural log of the result 
of dividing the total number of entities by the total entities 
with attribute value (TOTAL_ENTITIES_W_ATTR_ 

60 VALUE) for that particular attribute. As shown by steps 
1626 and 1628, step 1624 is repeated for each attribute. To 
reiterate, other functions for forming an inverse attribute 
frequency value may be used. Typically, the inverse attribute 
frequency value will decrease as the total entities with an 

65 attribute value increases. 

The results of this processing are depicted in FIG. 17C. 
For example, the inverse attribute frequency value (IAF) for 
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attribute D4 is 0.4054651(=ln(6/4)). The inverse attribute 
frequency values for the other attributes is similarly deter- 
mined as can be seen by reviewing FIGS. 17B and 17C. 

Referring now to FIGS. 16C, 17A, 17C and 17D, recall 
from step 1510 of FIG. 15 that transformed records having 
adjusted attribute values are determined and sum of squares 
of the adjusted attribute values are determined for each 
entity. (See FIG. 17D.) More specifically, as shown in step 
1.631 of FIG. 16C, processing loops for processing each 
entity and each attribute are initialized; an attribute count 
will loop from 1 to 6 and an entity count will also loop from 
1 to 6. Next, as shown in step 1632, a sum of squares value 
for an entity (SUM_SQR) is initialized by setting it to zero 
(0). Thereafter, as shown in steps 1633, 1635, and 1636, an 
adjusted attribute value (ADJ_ATT_VALUE) for all 
attributes of an entity is determined by multiplying the 
original attribute value (ATTRIBUTE_VALUE) by the 
inverse attribute frequency value (IAF) for that attribute. 
Within the same loop of attributes, as shown in steps 1634, 
1635, and 1636, a running set of sum of squares values 
(SUM_SQRS) is determined by adding the square of the 
adjusted attribute value (ADJ_ATTR VALUE) to a previous 
sum of squares value (SUM_SQRS). At the end of the loop 
through the attributes, the last sum of the running sum of 
squares will be the final sum of squares (SUM_SQRS) for 
a particular entity. As shown in steps 1637, 1638, and 1632, 
the steps of determining adjusted attributes values (ADJ_ 
ATT VALUE) across all attributes of an entity (step 1633) 
and a final sum of squares value (SUM_SQRS) for the 
entity (step 1634), are repeated for each entity. 

Given the original attribute values of FIG. 17A and the 
inverse attribute frequency values of FIG. 17C, the adjusted 
attribute values and sum of square values of FIG. 17D result 
(assuming unknown attribute values are assigned a default 
value of zero (0)). Take, for example, the entity CI; the 
adjusted attribute value for attribute Dl is 0.365(« 
2*0.1823216), the adjusted attribute value for attribute D5 is 
0.729(=4*0.1823216) and the sum of squares value is 
0.664823(«0.365 2 +0.729 2 ). The other adjusted attribute val- 
ues and sum of squares of the adjusted attribute values of an 
entity are similarly determined as can be seen by reviewing 
FIGS. 17A, 17C, and 17D. 

Referring now to FIGS. 16D, 17C, 17E, 17F, and 17G, 
recall from steps 1512 and 1514 of FIG. 15 that the new 
entity having certain attributes is processed to generate a 
transformed record of adjusted attributes and the square root 45 
of the sum of the squares of the adjusted attribute values is 
determined. First, as shown in step 1641 of FIG. 16D, a loop 
to process each of the attributes of the new entity is 
initialized; an attribute count will loop from 1 to 6. Next, as 
shown in step 1642, a sura of squares value (SUM_SQRS) 
for the new entity is initialized by setting it to zero (0). 
Thereafter, as shown in steps 1643, 1645, and 1646, an 
adjusted attribute value (ADJ_ATT_VALUE) for all 
attributes of the new entity is determined by multiplying the 
original attribute value (ATTRIBUTE_VALUE) by the 
inverse attribute frequency value (IAF) for that attribute. 
Within the same loop of attributes, as shown in steps 1644, 
1645, and 1646, a running set of sum of squares values 
(SUM__SQRS) is determined by adding the square of the 
adjusted attribute value (ADJ_ATTR_VALUE) to a previ- 
ous sum of squares value (SUM_SQRS). At the end of the 
loop through the attributes, the last sum of the running sum 
of squares will be the final sum of squares (SUM_SQRS) 
for the new entity and a square root of the sum of squares for 
the new entity is determined (step 1647). 

Referring to FIGS. 17C, 17E, 17F, and 17G, take, for 
example, the second attribute of the new entity; the adjusted 
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attribute value of the second attribute is 2.07944154(= 
3*0.69314718). The other adjusted attribute values of the 
new entity are similarly determined as can be seen by 
reviewing FIGS. 17C, 17E and 17F. The square root of the 
sum of squares is 3.0298536(-SQRT(0.7292862 2 + 
2.07944154 2 +2.07944154 2 ). 

Referring now to FIGS. 16E, 17F, 17G and 17H, recall 
from step 1516 of FIG. 15 that normalized attribute values 
of the new entity are determined by dividing the adjusted 
attribute values of the new entity by the square root of the 
sum of squares of adjusted attribute values of the new entity. 
More specifically, as shown in step 1652 of FIG. 16E, a loop 
to process each of the attributes is initialized; an attribute 
count will loop from 1 to 6. Next, as shown in step 1654, for 
each attribute, a normalized attribute value (NORM_ 
ATTR_VALUE) of an attribute of the new entity is deter- 
mined by dividing the adjusted attribute value (ADJ_ 
ATTR_VALUE) of the attribute of the new entity by the 
square root of the sum of squares of the adjusted attributes 
of the new entity. As shown in steps 1656 and 1658, this 
determination is repeated for each attribute. 

Referring to FIGS. 17F, 17G, and 17H, take for example, 
the first attribute of the new entity; the normalized attribute 
value is 0.2407002(=0. 7292862/3. 029853 6). Normalized 
values of the other attributes of the new entity are similarly 
determined as can be seen by reviewing FIGS. 17F, 17 G, and 
17H. 

Referring now to FIGS. 16F, 17D, 17H and 171, recall 
from step 1518 of FIG. 15 that similarities and case weights 
are determined for each entity. More specifically, as shown 
in step 1661 of FIG. 16F, processing loops for processing 
each entity and each attribute are initialized; an attribute 
count will loop from 1 to 6 and an entity count will also loop 
from 1 to 6. Moreover, a total case weight value TOTAL_ 
CASE_WT) is initialized by setting it to zero (0). Next, as 
shown in step 1662, a similarity value (SIM) between the 
new entity and an entity is initialized. Thereafter, as shown 
in steps 1663, 1664, and 1665, within a loop of attributes, a 
running set of similarities (SIM) is determined by adding the 
product of an adjusted attribute value ( AD J_ATTR _ 
VALUE) of the attribute of the entity and the normalized 
attribute value (NORM_ATTR__VALUE) of the attribute of 
the new entity to a previous similarity value (SIM). As 
shown in steps 1666, 1668, and 1669, for each entity, a case 
weight (CASE_WT) is determined by dividing the final of 
the running set of sums of similarities (SIM) for the entity 
by the square root of the sum of squares of adjusted attribute 
values (SUM_SQRS) of the entity. Moreover, recall from 
step 1520 of FIG. 15 that a total case weight value is 
determined. As shown in steps 1667, 1668, and 1669, a 
running sum of a total case weight value (TOTAL_CASE_ 
WT) is kept while looping through the entities. The total 
case weight (TOTAL_CASE_WT) is determined as the 
sum of the previous total case weight (TOTAL_ CASE_ 
WT) and the case weight (CASE_WT) of the current entity. 

Referring to FIGS. 17D, 17H, and 171, take, for example, 
the first entity CI; the similarity (SXY) between entity CI 
and the new entity is 0.088H0.365*0.2407002)+(0.729*0)) 
and the case weight is 0.10764438(=0.088/SQRT 
(0.664823)). The similarities and case weights of the other 
entities are similarly determined as can be seen by reviewing 
FIGS. 17D, 17H, and 171. 

Referring now to FIGS. 16G, 17A, 171, 17J, and 17K, 
recall from step 1522 of FIG. 15 that the attribute values of 
the original entity/attribute records are transformed to new 
attribute values based on the case weights. More specifically, 
as shown in step 1671 of FIG. 16G, processing loops for 
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processing each entity and each attribute are initialized; an FIG. 18 is a high level flow diagram of a correlation 

attribute count will loop from 1 to 6 and an entity count will determination and attribute ranking processes 1800 which 

also loop from 1 to 6. Moreover, recall from step 1524 of are carried out in accordance with the present invention. 

FIG. 15 that a sum of the new attribute values is determined. FIG. 19A illustrates an exemplary entity/attribute record of 

More specifically, as shown in step 1672, a total attribute 5 a new entity 1902 and an exemplary entity/attribute record 

value (TOTAL) of an attribute is initialized by setting it to of a databased entity 1904. In this example, the entities are 

zero (0). Next, as shown in steps 1673, 1675, and 1676, for television viewers, the attributes are television programs, 

each entity, a new attribute value (NEW_ATTR_VALUE) and the attribute values are ratings entered by the user, 

of an attribute is determined as the product of the original First> as shown in step 180 2, the entity/attribute records 

J^ te i^ W pfi^ 1 ^^ 1 ^ 5°. CaSe W ^#™ 10 1902 and 1904 are harmonized, if necessary, so that the 

^^^J^^^^^ 1 '^^^ 1 ^ entity/attribute records 1902 and 1904 have common 

of FIG. 15 that predicted attributes of the new entity are attribute fields For k refefri RG ^ ^ 

determined. More specifically, as shown in s^ entity/attribute record 1902 has a rating for 60 minutes but 

and 1676 in FIG. 16 G, a running sum (TO IAL) of the new . J c v UM 4 . ... » , .« 

attribute values (NEW^ATTR_VAL) for a particular *f ™ ratm * for whik the entity/attribute record 

attribute is kept while looping through the entities. The total « 1904 has a : rating for X-Files but has no rating for 60 

(TOTAL) is determined as the sum of the previous total minutes. FIG. 19B shows the harmomzed entity/attnbute 

(TOTAL) and the new attribute value (NEW_ATTR_VAL) records 1902 ' and 1904 ' Note that a default rating value (in 

of the current entity. Thereafter, as shown in steps 1676 and mis case 1) is assigned to "extension attributes" added to the 

1677, at the end of each entity loop, a predicted attribute entity/attribute records. 

value (SCORE) is determined for an attribute of the new 20 Since steps 1804 and 1806 are optional (as indicated by 

entity by dividing the sum of the new attribute values me phantom lines), these steps will be described later. 

(TOTAL) by the total case weight. As shown in steps 1678 Assuming that the optional steps 1804 and 1806 are not 

and 1679, the loop through the entities during which new performed, a correlation value "r" between an a r * entity and 

attribute values (NEW _ATTR_VALUE) and running totals a & i* entity is determined as shown in step 1808. The 

are determined, and the predicted attribute value (SCORE) 25 correlation value "i a i " may be determined based on the 

determination is repeated for each attribute. following equation: 

Referring to FIGS. 17A, 171, 17J, and 17K, take attribute 

D5, for example. The new attribute D5 for the first entity CI n Y v aJ v t j-Y,v a} Y l v, j W 

is 0.4305775(=4*0.10764438), the attribute D5 for the sec- = j j j 

ond entity C2 is 0.75 92906(-l* 0.75929062), the attribute 30 " aJ f ' ~ " ~ 

D5 for the third entity C3 is 4.501531 9(=5*0.9003063 9), the \nz v 2 aj - I £ v aj ] /V£ vfj - 1 5 v 0 ] 

attribute D5 for the fourth entity C4 is 0(^0*0.62568901), r J (j J ^ j > 

the attribute D5 for the fifth entity C5 is 0.5922505(= 

1*0.59225952), and the attribute D5 for the sixth entity C6 

is 0(=0*0.09133079). The other new attribute values are 35 where: 

similarly determined as can be seen by reviewing FIGS. v <v=Trie value of the j r attribute of the a r entity. 

17A, 171, and 17J. The total of these new attribute values of v lV -The value of the j* attribute of the i* entity, 

attribute D5 over all entities is 6.2836506. The other new N=The number of attributes with values in a set of 

attribute value totals for all entities are similarly determined. common values between the a f * and \ th entities. 

Finally, the prediction of attribute D5 for the new entity is 40 If the values v aj . and v^. are adjusted based on the attribute 

2.0424595(=6.2836506/3.07651171). The prediction of frequency value as discussed above (i.e., such that they 

other attributes of the new entity are similarly determined as decrease as the number of entities having a value for the ] th 

can be seen by reviewing FIGS. 171, 17J, and 17K. attribute increases), then the process for determining the 

However, since, as shown in FIG. 17E, the values of the correlation value x a i will have used the inverse attribute 

attributes Dl, D2, and D3 of the new entity were already 45 frequency aspect of the invention. 

known, the predictions of the attribute values for these The weight, a normalized version of the correlation value r, 

attributes is either not made, or made and discarded. Such may be expressed as: 
instances are depicted as "Given" in FIG. 17K. 

§3.3 Third Illustrative Embodiment — Correlation-type n , . _ r «.i (8) 

Matching and Collaborative Filtering Processes Using 50 a * > £J r< W ' 1 
Inverse Attribute Frequency, Attribute Set Harmonization 
(with Default Voting), Attribute Expansion (with Default 

Voting), and Tuning Aspects of the Present Invention (FIGS. §3.3.2 Determining Correlation with Default Filler 

18 and 19) Attributes (Expansion Aspect of the Invention) 

The following will describe (i) determining a correlation 55 In the examples discussed above with reference to FIGS, 

between entities, (ii) determining a correlation between 19A and 19B/' extension attributes" were added so that the 

entities with filler (i.e., harmonization and/or expansion) entity/attribute records 1902 and 1904 would have a com- 

atlributes, (iii) tuning (i.e., amplifying and/or suppressing) mon set of attributes (e.g., Mash, CBS News, 60 Minutes 

correlation determinations, and (iv) ranking attributes. FIG. and X-Files). As discussed above, the attribute expansion 

18 is a high level flow diagram of a process 1800 which 60 aspect of the present invention permits match (e.g., 

includes each of the above-mentioned procedures. FIGS. correlation) determinations to be made based on attributes 

19A through 19C provide an example for illustrating har- ("filler attributes" or "expansion attributes") that neither 

monizing entity/attribute records and for adding filler entity/attribute record includes. A default attribute value 

attributes to entity/attribute records. (e.g., one (1)) is assigned to each of the filler attributes. 

§3.3.1 Determining Correlation (Adjusted based on 65 Although basing a match (e.g., correlation) determination on 

Inverse Attribute Frequency and Attribute Set Harmoniza- "manufactured" information is counterintuitive, the present 

tion Aspects of the Invention) inventors have found that the use of such "filler attributes" 
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present invention is used to suggest content (which is a type 
of attribute) which may be of interest to a user (which is a 
type of entity). In this context, the goal of the present 
invention is more to properly rank order content than to 
predict a user's rating of that content. The tuning aspect of 
the present invention employs a tuning parameter to 
"amplify" content likely to be of interest and to "suppress" 
content unlikely to be of interest. In the present invention, a 
tuned correlation value r' (or entity weight as shown in step 



aids correlation when most entities (or users) can be 
assumed to find most attributes (or content) not useful. 
Indeed, in the context of recommending content to a user, the 
present inventors have found that the use of such "filler 
attributes" often improves content recommendations, they 
believe, because the "filler attributes" provide a background 
or reference that, in effect, highlights "matching" attributes 
(e.g., votes). 

As shown in step 1804 of FIG. 18, conceptually, entity/ 
attribute records (e.g., 1902' and 1904') are expanded so that 10 1810 of FIG. 18) is defined as follows: 
they include attributes that neither included before. Next, as 
shown in step 1806, default attribute values are added to the 
"filler attributes". Referring to FIGS. 19B and 19C, if N 2 
filler attributes are added to the entity/attribute records 1902' 
and 1904' of FIG. 19B, and a default attribute value of one 
(1) is assigned to each of the filler attributes, the entity/ 
attribute records 1902" and 1904" of FIG. 19C result. 

The process of adding filler attributes having default 
attribute values to entity/attribute records can be used with 
any matching or collaborative filtering process. This process 
tends to reduce the number of explicitly matched attributes 
needed to predict an attribute (e.g., recommend content) 
because it tends to improve predictions (e.g., 
recommendations) when the number of explicitly matched 



r'=r" if r^O 



r '_ (. r )P if r <0 



(10) 
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where p is tunable parameter. 

The tunable parameter p is typically greater than 1. The 
optimal value for the tunable parameter p depends on the 
particular application and can therefore be determine empiri- 
cally. A typical value for the tunable parameter p is 2.5. 
§3.3.4 Comparative Analysis 

An experiment was conducted to compare predictions 
(suggestions) made in accordance with the present invention 
and those made in accordance with algorithms of the known 
Group Lens system. In the following experiment, the entities 
attributes is small. Thus, this aspect of the present invention 25 were Internet users (5000 (test set) uniformly, randomly 



20 



overcomes the bootstrapping problem of the GroupLens 
system discussed above; encouraged by good initial predic- 
tions or recommendations, users are more likely to explicitly 
enter attributes (e.g., ratings votes). 

Although the process of adding filler attributes having 30 
default attribute values was described as altering entity/ 
attribute records, this process can be effected without explic- 
itly enumerating the additional attributes and modifying the 
entity/attribute records. More specifically, the process can be 
carried out by the following formula for determining the 35 
correlation "r" which equals: 



selected users from 32711 (training set) uniformly, randomly 
selected users who visited the Internet site "www.microsoft- 
.com" within the seven (7) days from Feb. 26, 1997 to Mar. 
24, 1997, inclusive), the attributes were Vroots (each 
"Vroot", or "Virtual Root" is a sub-area of the www.mi- 
crosoft.com website) of the web site visited, and the attribute 
values were 1 if the user visited only one page in a Vroot 
within a sample period and 2 if the user visited two or more 
pages within a Vroot within the sample period. 

Suggestion accuracy was determined by: (i) going 
through the 5000 user/Vroot records and selecting one user 



((Z / * N ^ dl + X>*H^ + Z flP* d + Z g» 

V(«Z / + N 2 )(N 2 d 2 + £ IX 1 ) - (N 2 d + £ A) 2 )((I / + N 2 ){N 2 d* + £ JY 2 ) - (N 2 d + £ fY) 2 )) 
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where: 

X-A vector of attribute values (e.g., an array having 45 

values of 100, 17, 18, and 1) of the harmonized new 

entity/attribute record 1902'. 
Y=A vector of attribute values (e.g., an array having 

values 20, 16, 1, 15) of the databased entity/attribute 

record 1904'. 

f=A vector of inverse attribute frequency (IAF) values, the 
determination of which was described above with ref- 
erence to FIG. 16B. 

N 2 sThe number of filler attributes to be added 55 
(conceptually) to the entity/attribute records being cor- 
related. 

d=The value of the default vote. 
The optimal value for N 2 depends on the particular appli- 
cation and can therefore be determined empirically. Typical 60 
values may be 100, 1000, or 10000. As expected, if N 2 is 
zero (0), the above equation reduces to a simple correlation 
equation. If the default value d is zero (0), the above 
equation can be reduced. 

§3.3.3 Tuning (Amplifying) a Correlation Value 65 
Although the present invention is useful for predicting an 
attribute of an entity, in one particular application, the 



at a time as a "new user"; (ii) randomly selecting one 
attribute to "hold back" from the selected new user; (iii) 
performing a collaborative filtering on the new user's other 
attributes to generate a set of recommendations; (iv) deter- 
mining whether the attribute held back was on the list of 
recommendations; (v) estimating the probability that the 
new user would navigate far enough down through the 
suggested Vroot list to see the held-back attribute (where the 
probability was modeled as "half-life" decay with a 100% 
chance of seeing the first recommendation, 50% chance of 
seeing the 5 th recommendation, 25% change of seeing the 
l(f h recommendation, etc.); and (vi) averaging these prob- 
abilities over all 5000 users to determine an accuracy value. 

The experiment was conducted without a popularity fil- 
tering process disclosed in U.S. patent application Ser. No. 
08/807,566, filed on Feb. 28, 1997, and entitled "Methods 
and Apparatus for Retrieving and/or Processing Retrieved 
Information as a Function of a User's Estimated 
Knowledge", (now U.S. Pat. No. 6,006,218 issued Dec. 21, 
1999) incorporated herein by reference. Popularity filtering 
basically performs post processing on the results of the 
present invention. The accuracies are normalized by divid- 
ing them by the accuracy of a "popularity list" (i.e., a list of 
all attributes ranked from most popular to least popular). 
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ACCURACY RELATIVE TO 


PROCESS 


POPULARITY LIST 


Correlation with Inverse Attribute 


1.278 


Frequency, Attribute Expansion, 




Default Voting and Tuning 




(N2 = 10000, d - 0, p = 2.5) 




Correlation with Attribute 


1.249 


Expansion, Default Voting, and 




Tuning (N2 = 10000, d = 0, p = 2.5) 




Similarity with Inverse Attribute 


1.240 


Frequency 




Correlation with Inverse Attribute 


1.222 


Frequency, Attribute Expansion and 




Default Voting (N2 - 10000, d - 0, p - 1 




Similarity 


1.215 


Correlation with Attribute 


1.205 


Expansion and Default Voting 




(N2 = 10000, d - 0, p - 1) 




Popularity List 


1.000 


GroupLens 


0.053 



5 



As the above results show, when compared with the 
GroupLens process, the use of filler attributes and default 
votes substantially improves accuracy. This is apparently 
because the GroupLens process is not getting enough data 25 
(even with 5000 users) from the data set. Using the inverse 
attribute frequency and tuning aspects of the present 
invention, in addition to the use of filler or expansion 
attributes and default votes, further enhances accuracy. 

§4 Architecture of Network Application of Invention 30 
(FIGS. 10 and 11) 

As alluded to above, the present invention may be imple- 
mented in a network (e.g., the Internet) having a client (e.g., 
a personal computer with an Internet browser application 
such as Microsoft Explorer™) and a server (e.g., an Internet 35 
server). A high level example of an architecture for imple- 
menting the present invention in such an environment 1000 
is illustrated in FIG. 10. An exemplary user interface 1100 
for implementing the present invention on an Internet 
browser application is illustrated in FIG. 11. The processes 40 
f or implementing the present invention on an Internet 
browser application are illustrated in FIG. 12. Finally, FIGS. 
13A, 13B, and 13C, illustrate the data flow used to (i) 
provide a client with an entity identification means, (ii) build 
an entity/attribute database, and (iii) predict and present 45 
HTML pages, within a web site, which may be of interesi to 
a user, respectively. 

FIG. 10 is a high level block diagram of a network 
environment 1000 in which the methods and apparatus of 
the present invention may operate. The environment 1000 50 
includes, inter alia, a client (e.g., a personal computer) 1002 
which may communicate data via a network (e.g., the 
Internet) 1006, and a server (e.g., a personal computer) 1004 
which may also communicate data via the network 1006. 

The client 1002 may include processors) 1022, storage 55 
device(s) 1024, and input/output interface(s) 1026, which 
may share a system bus 1028. The storage device(s) 1024 
may store program instructions for implementing at least a 
portion of the process of the present invention. At least a 
portion of the process of the present invention may be 60 
effected when the processor(s) 1022 executes the stored 
program instructions. The input/output interface(s) 1026 
permits communication with the network 1006, for example 
via an ISDN terminal adapter. The input/output interface(s) 
1026 further functions to condition inputs provided via an 65 
input device(s) 1020 (e.g., a keyboard, mouse, and/or other 
man-machine interface) and to condition outputs provided to 
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an output device(s) 1021 (e.g., a video display, audio 
speakers, etc.). 

Similarly, the server (e.g., a personal computer) 1004 may 
include processors) 1032, storage device(s) 1034, and 
input/output interface(s) 1036, which may share a system 
bus 1038. The storage device(s) 1034 may store program 
instructions for implementing at least a portion of the 
process of the present invention. At least a portion of the 
process of the present invention may be effected when the 
processor(s) 1032 executes the stored program instructions. 
The input/output interface (s) 1036 permits communication 
with the network 1006, for example via a modem bank. The 
input/output interface(s) 1036 (e.g., a Small Computer Sys- 
tem Interface (or "SCSI") protocol unit) may also permit 
records, such as entity/attribute records, to be written to and 
read from a database stored on a storage device (e.g., a 
magnetic or optical disk) 1040. 

The network 1006 may include, inter alia, bridges, 
routers, switching systems, etc., to forward data to an 
addressed (e.g., in accordance with TCP/IP (Transmission 
Control Protocol/Internet Protocol)) destination. 

FIG. 12 is a process diagram which illustrates the opera- 
tion of the methods and apparatus of the present invention in 
the network environment 1000 of FIG. 10. More specifically, 
the client 1002 may include a browser process (or more 
generally, a browser) 1202 which, in turn, includes a user 
interface process (or more generally, a user interface) 1204, 
a navigation process (or more generally, a navigator) 1206, 
and an improved collaborative filter process (or more 
generally, a collaborative filter) 1208. The user interface 
process 1204 may generate a graphical user interface which 
is presented to a user via the output device(s) 1021 (see FIG. 
10). The navigation process 1206 shown in FIG. 12 interacts 
with the user interface process 1204 to allow a user to access 
various content. The improved collaborative filter process 
1208 determines where the user has been (e.g., content that 
the user has requested, downloaded, and/or rendered) from 
the navigation process 1206 and presents suggested content 
to the user via the user interface process 1204 (perhaps by 
way of the navigation process 1206). Each of these pro- 
cesses may be part of an Internet browser application stored 
in an area of the storage device(s) 1024 and carried out by 
the processors) 1022. 

Referring to FIG. 11, the user interface process 1204 may 
be used to generate a graphical user interface 1100 to be 
displayed on a video display unit. As shown in FIG. 11, the 
graphical user interface 1100 may display the content of an 
HTML page in an area 1104. The graphical user interface 
1100 may further include an address line 1102 which 
describes the location (e.g., "http: /www. microsoft.com/ 
NT") of the HTML page being viewed in area 1104, a 
history display area 1122 and a suggestions display area 
1112. More specifically, the history display area 1122 sets 
forth where the client entity 1002 has been (or what content 
the client entity 1002 has requested, downloaded, and/or 
rendered) within an Internet web site (or, alternatively, areas 
visited or content consumed in general). In this example, the 
client entity 1002 has been to a JAVAHTMLpage 1124a, an 
MS OFFICE HTML page 11246, and an EXCEL HTML 
page 1126, which, as inferred by the indent, was accessed 
via the MS OFFICE HTML page 11246. The suggestions 
display area 1112 lists, and/or provides hyper-text links to, 
HTML pages or content which the improved collaborative 
filter process 1208 predicts that the client 1002 entity may 
want to render (or consume). As one familiar with the 
Internet knows, a hyper-text fink is special text (i) which is 
linked with an Internet address which corresponds to an 
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HTML page or content associated with the hyper-text link by an Internet server. The way in which the entity/attribute 
and (ii) which, when activated (e.g., by a double mouse database is generated is not crucial for the operation of the 
click), connects the client entity 1002 to the new Internet improved collaborative filter process 1208; it may be corn- 
address. The pointer icon 1116 may be moved about the piled off-site, compiled by a third party, purchased or rented, 
display 1100 via an input device (e.g., a mouse or trackball) 5 compiled based on survey data, etc. Once these two pre- 
and may be used, inter alia, to activate a hyper-text link. liminary operations are performed, the improved collabora- 

Finally, in addition to, or instead of, the history screen Uve filter P rocess 1208 may be executed. FIG. 13C is a data 

area 1122, an attribute screen area 1132 may be provided in flow d i a ^ am wh ich illustrates the operation of the present 

which implicitly determined or explicitly entered attributes * venU °? * S ™pkmcnted on the Internet. In FIGS I 13 A 

of the client entity 1002 may be displayed, entered, and/or 10 ^oughl3C,the mput/output interface process(es) 1210 and 

edited. In this instance, the attribute screen area 1132 notes J" networking process 1220 are not shown because the data 

. ... , \ • on }a i flow through these processes may occur in a conventional 

that the entity (user) is a 29 year old, German male. ✓ • * j j * i\ 

. J \ . * / w n manner (e.g., via standard communications protocols), 

The input/output interface process(es) (or more generally which ^ understood b those skilled in the art< 

an input/output interface^)) 1210 may include a stacked Referring now to FIG. 13A which, to reiterate, is a data 

layer of communications protocols, layers of which may be 15 flow diagram which illustrates the initial provision of entity 

earned out when the processors) 1022 executes program identification to a client 1002 on the Internet, a user first 

instructions which may be stored in an area of the storage selects a specific HTML page or content via the client user 

device(s) 1024, and other layers of which may be carried out interface process 1204. Referring back to FIG. 11, this 

by the input/output interface(s) 1026. selection may be made by typing an Internet address on the 

The networking process (or more generally, a network) 20 address line 1102 or by activating (e.g., by double clicking) 

1220 is carried out by routers, bridges, and/or switching a hyper-text link. This selection 1302 is relayed to the 

systems in order to forward data to an addressed location in navigation process 1206 which, in response, forms an 

a way known to those skilled in the art. In this particular HTML request 1304 and transmits this request to the server 

application, the networking process 1220 can, inter aha, content transfer process 1230. The content transfer process 

transfer requests for content from the client 1002 to the 25 1230 formulates a content request 1306 to the database of 

server 1004, and transfer the requested content from the stored content 1250 in response to the HTML request 1304. 

server 1004 back to the requesting client 1002. The net- ^ ^quested content 1308 is then returned to the server 

working process 1220 can also forward other types of data, co ** nt transfer P rocess 1 ^ 0 ' „, A L M , 

j , m 1 1 ... c The server content transfer process 1230 then adds a data 

which will be discussed in more detail below with reference . . , . . v . t . ^ nn . . , , 

♦ m/^c iia»i, u no ♦ 1 j t , „„ structure to the content to permit the server 1004 to later 

to RGS. 13A through 13C to an addressed location. 30 ize ^ m{ n&x J^ identif y) the dient m2 

The server 1004 may include a conten transfer process More ificall the data structure its me server 1004 
(or more generally, a content transfer facility) 1230 and an to distinguish clients 100 2. As is known to those 
entity/attribute record generation and retrieval process (or famiUar with the Imemet> lhis data structure raay be a 
more generally, an entity/attribute record generator and "cookie." A cookie includes a name value and may further 
retriever) 1240. The content transfer process 1230 operates 35 include an expiration date value, a domain value, and a path 
to (i) accept incoming requests, (ii) translate, if necessary, value. Thus, the content associated with the requested 
the incoming request to a query of the content database HTML page and the cookie 1310 are transferred from the 
1250, (iii) receive the requested content, and (iv) transfer the server content transfer process 1230 to the navigation pro- 
requested content back to the requesting entity. As will be cess 1206 (or to the browser process 1202 in general), 
described in more detail below, the content traasfer process 40 The navigation process (or browser process 1202 in 
1230 also conveys information of the requesting entity and general) provides the content 1312 to the user interface 
the requested content to the entity/attribute record genera- process 1204 and may present a query 1314 to the user 
tion and retrieval process 1240. interface process 1204 to determine whether or not the client 

The entity/attribute record generation and retrieval pro- 1002 wants to save the cookie. If the browser process 1202 

cess 1240 operates to (i) accept entity and content 45 already includes a default value or setting which automati- 

information, (ii) create new, or update existing, entity/ cally decides the fate of the cookie, the message 1314 is not 

attribute records based on the accepted information, (iii) generated. Assuming that the user wants to save the cookie 

store entity/attribute records to the database 1040, and (iv) (i.e., whether or not the cookie is to be saved), the user 

retrieve entity/attribute records from the database 1040. The interface process 1204 so informs the navigation process 

entity/attribute record generation and retrieval process 1240 50 1206 (or the browser process 1202 in general) with message 

may also operate to perform maintenance operations on the 1316. The cookie is then saved in an appropriate area of the 

database 1040, such as purging stale records, reordering storage device(s) 1024 of the client 1002. 

records, etc. Referring now to FIG. 13B which, to reiterate, is a data 

Both the content transfer process 1230 and the entity/ flow diagram which illustrates how a database of entity/ 

attribute record generation and retrieval process 1240 may 55 attributes records may be built by an Internet server 1004, a 

be stored as program instructions in an area of the storage user first selects a specific HTML page or content at the 

device(s) 1034 and carried out by the processors) 1032. client user interface process 1204. Again, this selection may 

A few preliminary operations should be performed before be made by typing an Internet address on the address line 
the improved collaborative filter process 1208 may be 1102 or by activating (e.g., by double clicking) a hyper-text 
carried out in the context of the environment 1000 of FIG. 60 link. This selection 1320 is relayed to the navigation process 
10. More specifically, the client 1002 (and/or the server 1206 which, in response, forms an HTML request 1322 and 
1004) should first have means for identifying the client 1002 transmits this request to the server content transfer process 
as a particular entity. FIG. 13A is a data flow diagram which 1230. The content transfer process 1230 formulates a con- 
illustrates the initial provision of entity identification to a tent request 1324 to the database of stored content 1250 in 
client on the Internet. Second, the entity/attribute database 65 response to the HTML request 1322. The requested content 
should be generated. FIG. 13B is a data flow diagram which 1326 is then returned to the server content transfer process 
illustrates how a database of entity/attributes records is built 1230. 
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The server content transfer process 1230 then assembles returned to the navigation process 1206. Alternatively, a 

the requested content, together with a cookie and/or a setting flag may be used to determine whether or not the 

request for cookies and/or a request for client attributes, and browser process 1202 includes the improved collaborative 

returns the resulting data 1328 to the navigation process filter process 1208. 

1206 (or to the browser process 1202 ). The content 1340 5 In any event, if the improved collaborative filter process 

may then be provided to the user interface process 1204 for 1208 is available, the navigation process 1206 assembles a 

presentation to the user via an output device 1021. Before, message 1346 including a HTML request and a flag or 

after, or concurrently with the transfer of the content, the message indicating that the improved collaborative filter 

navigation process 1206 (or the browser process 1202 in process 1208 is available. Recall from FIG. 13B that this 

general) may return its appropriate cookie and/or attributes 10 message 1346 may include a cookie. The navigation process 

in data transfer 1330 to the server content transfer process. 1206 addresses the message 1346 to the server content 

Note that the browser process 1202 of the client 1002 may transfer process 1230. The content transfer process 1230 

have included an appropriate cookie in the original HTML formulates a content request 1348 to the database of stored 

request 1322. In a dumb mode, the client 1002 can include content 1250 in response to the HTML request 1346. The 

all of its cookies in the HTML request 1322. If, however, the 15 requested content 1350 is then returned to the server content 

cookies stored by the client 1002 include domain and path transfer process 1230. The server content transfer process 

values, only the cookie corresponding to the HTML request 1230 may then transfer the requested HTML content to the 

are provided. If the cookie or cookies are provided with the navigation process 1206 in data transfer 1357. If the transfer 

original HTML request 1322, the transfer of cookies and/or of the HTML content occurs at this time, it will be sent from 

entity attributes in data transfers 1328 and 1330 are either (a) 20 the navigation process 1206 to the client user interface 

not needed or (b) may be limited to requesting and returning process 1204 in data transfer 1359. Otherwise, as discussed 

client attributes. below, the requested HTML content will be returned to the 

In any event, the server content transfer process 1230 navigation process 1206 later, 
provides the cookie (s) and/or client attributes 1330' to the Before, after, or concurrently with the content request 

entity/attribute record generation and retrieval process 1240. 25 1348, the server content transfer process 1230, recognizing 

Recall from FIG. 1A that a record 102 may be identified by that the client has an improved collaborative filter available, 

an entity 106 and has a number of attributes 104. The sends a request 1352 for records 102 to the entity/attribute 

entity/attribute record generation and retrieval process 1240 record (generation and) retrieval process 1240. As alluded to 

will either (a) update an existing record 102 or (b) create a above, if the entity/attribute database 1040 is generated 

new record 102, depending upon whether or not a record 30 outside of the system 1000, the process 1240 will obviously 

corresponding to the entity (as identified by the cookie) not need to generate the records 102; although it may be 

exists. More specifically, the entity/attribute record genera- used to update records 102. In any event, the entity/attribute 

tion and retrieval process 1240 dispatches a record request record (generation and) retrieval process 1240 sends the 

1332 to the entity/attribute record database 1040. request 1352' for records 102 to the entity/attribute database 

Alternatively, the record request 1332 may be dispatched to 35 1040. In response, all of the records or a sample of the 

an entity table to see if the entity 106 exists within the records 1354 are sent back to the entity/attribute record 

database 1040. If a record 102 having an entity value 106 (generation and) retrieval process 1240. This information 

identified by the cookie exists, the requested record 102 is 1354 is then forwarded to the server content transfer process 

returned in data transfer 1334. The entity/attribute record 1230 in data transfer 1354'. To reiterate, in some instances, 

generation and retrieval process 1240 then updates the 40 only a sample of all of the records are needed. In an 

attributes of the record 102, if necessary, and returns the application similar to the one illustrated in FIG. IB in which 

updated record 102' to the database 1040 in data transfer the entities are computers and the attributes are HTML pages 

1336. For example, referring to the database of FIG. IB, if of an Internet web site, a one (1) megabyte file with 15,000 

the client 1002, which was identified as user 2 1566 by its entities 156 (or 15,000 records 152) was considered an 

cookie, visited the e-mail HTML page 1546, the second 45 adequate sample size. 

attribute 154b of the record 1526 would be updated to reflect The server content transfer process 1230 returns the 

this visit (or content consumption). Otherwise, the entity/ sample records (and the requested HTML content if not 

attribute record generation and retrieval process 1240 will returned earlier in data transfer 1357) to the navigation 

create a new record for the new client entity 1002. process 1206 of the server 1002 in data transfer 1356. 

Now that the two preliminary operations have been 50 Naturally, the server content transfer process 1230 may 

performed — namely (i) the client has means (e.g., stored return the HTML content and the sample records to the 

cookie) for identifying it as a particular entity, and (ii) the navigation process 1206 of the server 1002 in any temporal 

entity/attribute database has been generated — the environ- order. For example, the HTML content may be downloaded 

ment 1000 in which the present invention is employed, may first (see, e.g., transfer 1357) so that the user of the client 

use the present invention for collaborative filtering. FIG. 55 1002 will quickly see the requested content (e.g., in display 

13C is a data flow diagram which illustrates the operation of area 1104). The download of the sample records and sub- 

the present invention as implemented on the Internet. First, sequent processing, discussed below, may occur while the 

a user selects a specific HTML page or content at the client user of the client 1002 is viewing the requested content. In 

user interface process 1204. Again, this selection may be any event, if the navigation process 1206 has not already 

made by typing an Internet address on the address line 1102 60 provided the HTML content to the client user interface 

or by activating (e.g., by double clicking) a hyper-text link. process 1024, it does so with data transfer 1358. 
This selection 1340 is relayed to the navigation process The navigation process 1206 will also provide the sample 

1206. In response, the navigation process 1206 transmits an records to the improved collaborative filter process in data 

alert message 1342 to the improved collaborative filter transfer 1360. These sample records are processed, as dis- 

process 1208 to determine whether the process is available. 65 cussed above with reference to FIG. 3 or FIG. 15 and 

Assuming that the browser has the improved collaborative predictions (in this case, HTML content which the user of 

filter process 1208, a filter available message 1344 is the client 1002 is likely to want) are made by the improved 
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collaborative filter process 1208 and provided to the navi- 
gation process 1206 in data transfer 1362. The navigation 
process 1206 may forward these predictions (e.g., recom- 
mended HTMLs) to the client user interface process 1204 in 
data transfer 1364. The client user interface process 1204 5 
may bring these suggested HTMLs to the user's attention, 
for example, in the display portion 1112 of the graphical user 
interfacell00ofFIG.ll. 

Naturally, referring back to FIG. 2, the improved collabo- 
rative filter engine (or process) 202 may be located any- 
where within the network 1000, not necessarily at the client 
1002. Similarly, the entity/attribute data record retriever 206 
and database of entity/attribute data records 208 may be 
located elsewhere in the network 1000, not necessarily at the 
server 1004. Basically, these elements should be located to 
optimize the underlying application. Similarly, the network 
need not be the Internet, but rather, may be an Intranet or 
another network configuration. Indeed, as should be appar- 
ent from FIG. 2, the present invention may be employed at 
a single location, in a non-distributed manner. Various 2Q 
modifications of the apparatus and methods of the present 
invention, as well as environments in which the present 
invention may operate will be apparent to those skilled in the 
art. 

What is claimed is: 25 

1. A method for use in computer-implemented apparatus 
for providing an indication that specifies a degree to which 
(i) a first entity having a first set of attributes and (ii) a 
second entity having a second set of attributes match each 
other, each of said sets of attributes reflecting predefined 3Q 
characteristics of an associated one of the entities, wherein 
the apparatus has a processor, a storage device, connected to 
the processor and storing both computer executable instruc- 
tions and a data structure therein, the data structure storing 
both a corresponding parameter defining each of said entities 35 
and a set of attribute parameters for a corresponding set of 
the attributes associated with said each entity, wherein the 
method, performed by the processor in response to execu- 
tion of the instructions implements a collaborative filter 
match engine, comprises the steps of: 4Q 

a) accessing the data structure; 

b) manipulating, through interaction with the storage 
device, at least a portion of the data structure stored 
within the storage device to form an expanded set of 
attributes by extending at least one of (i) the first set of 45 
attributes to form a first extended set of attributes which 
includes attributes found in neither the first nor second 
sets of attributes, and (ii) the second set of attributes to 
form a second extended set of attributes which includes 
attributes found in neither the first nor second sets of 50 
attributes; 

c) determining, in response to attribute parameters asso- 
ciated with the first and second entities in the expanded 
set of attributes stored within the data structure, a 
degree to which the first and second entities, at least 55 
one of which having a corresponding extended set of 
attributes, match; and 

d) generating the indication, as output, that specifies the 
degree of match. 

2. The method of claim 1 wherein both (i) the first set of 60 
attributes is extended to form the first extended set of 
attributes which includes attributes found in neither the first 
set of attributes nor the second set of attributes, and (ii) the 
second set of attributes is extended to form the second 
extended set of attributes which includes attributes found in 65 
neither the first set of attributes nor the second set of 
attributes. 
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3. The method of claim 2 wherein the first and second 
extended sets of attributes are collectively defined as a union 
of (a) the first set of attributes, (b) the second set of 
attributes, and (c) a third set of attributes found in neither the 
first set of attributes nor the second set of attributes. 

4. The method of claim 3 wherein at least some of the 
attribute parameters associated with the first set of attributes 
have associated values and at least some of the attribute 
parameters associated with the second set of attributes have 
associated values. 

5. The method of claim 4 further comprising a step of: 
assigning, with the machine, a default value to any of said 

attribute parameters not having an associated value, 
wherein the step of assigning a default value is per- 
formed before the step of determining the degree to 
which the first and second entities match. 

6. The method of claim 2 wherein at least some of the 
attribute parameters associated with the first set of attributes 
have associated values and at least some of the attribute 
parameters associated with the second set of attributes have 
associated values. 

7. The method of claim 6 further comprising a step of: 
assigning, with the machine, a default value to any of said 

attribute parameters not having an associated value, 
wherein the step of assigning a default value is per- 
formed before the step of determining the degree to 
which the first and second entities match. 

8. The method of claim 1 wherein at least some of the 
attribute parameters associated with the first set of attributes 
have associated values and at least some of the attribute 
parameters associated with the second set of attributes have 
associated values. 

9. The method of claim 8 further comprising a step of: 
assigning, with the machine, a default value to any of said 

attribute parameters not having an associated value, 
wherein the step of assigning a default value is per- 
formed before the step of determining the degree to 
which the first and second entities match. 

10. The method of claim 1 wherein the step of determin- 
ing a degree to which the first and second entities match is 
performed by correlating the attribute parameters associated 
with the first and second entities and contained in the 
expanded set. 

11. The method of claim 1 wherein the step of determining 
a degree to which the first and second entities match is 
performed by determining a similarity between the attribute 
parameters associated with the first and second entities and 
contained in the expanded set. 

12. The method of claim 1 further comprising a step of: 
determining, with the machine, a weight based on the 

degree to which the first and second entities match. 

13. The method of claim 12 wherein the weight deter- 
mined is a normalized value of the degree to which the first 
and second entities match. 

14. A computer-readable medium having computer 
executable instructions stored therein, said instructions 
being executed by a computer for performing the steps of 
claim 1. 

15. A method for use in computer-implemented apparatus 
for predicting a value of an attribute of an entity, having a 
first set of attributes, based on values of attributes associated 
with a group of entities, each of the entities of the group has 
a corresponding set of said attributes which reflect pre- 
defined characteristics thereof, the apparatus having a 
processor, and a storage device, connected to the processor 
and storing both computer executable instructions and a data 
structure therein, the data structure storing, for each one of 
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said entities, both a corresponding parameter defining said 
each one entity and attribute values associated with said 
each one entity, wherein the method comprises the steps, 
performed by the processor in response to execution of the 
instructions, of: 

a) accessing the data structure; 

b) manipulating, through interaction with the storage 
device, at least a portion of the data structure stored 
within the storage device to form an expanded set of 
attributes by extending the first set of attributes to 
include attributes found in neither the first set of 
attributes nor associated with at least one other entity in 
the group of entities; 

c) determining, for each one of the entities in the group 
other than the first entity and in response to values of 
the attributes, in the expanded set, associated with the 
first entity and said each one of the other entities, a 
degree to which the first entity matches said each one 
of the other entities; and 

d) predicting the value of one of the attributes associated 
with the first entity based on a sum, taken over the 
attribute values associated with all of said entities in the 
group, of a function of (i) a value of each of the 
attributes associated with each one of the other entities, 
and (ii) a degree to which the first entity matches said 
each one of the other entities. 

16. The method of claim 15 wherein the function is a 
product. 

17. The method of claim 16 further comprising the step of: 
assigning a default value to any of the attributes not 

having an associated value before determining the 
degree to which the first entity and said each one of the 
other entities match. 

18. The method of claim 15 wherein the expanded set of 35 
attributes is defined by a union of (i) the first set of attributes 
for the first entity and (ii) the set of attributes of every other 
one of the entities of the group and (iii) attributes found in 
neither the first set of attributes nor the set of attributes of 
every other one of the entities of the group of entities. 40 

19. The method of claim 18 further comprising the step of: 
assigning a default value to any of the attributes not 

having an associated value before determining the 
degree to which the first entity and said each one of the 
other entities match. 

20. The method of claim 15 further comprising the step of: 
assigning a default value to any of the attributes not 

having an associated value before determining the 
degree to which the first entity and said each one of the 
other entities match. 

21. The method of claim 15 wherein the step of deter- 
mining the degree to which the first entity and each other one 
of the entities match is performed by correlating the attribute 
values associated with the first entity with the attribute 
values associated with said each other one of the entities. 

22. The method of claim 15 wherein the step of deter- 
mining the degree to which the first entity and each other one 
of the entities match is performed by determining similarity 
between the attribute values associated with the first entity 
and the attribute values associated with said each other one 
of the entities. 

23. The method of claim 15 further comprising the step of: 
for each of said entities in the group, determining a weight 

based on the degree to which the first entity and said 
each other one of the entities match, wherein the weight 65 
determining step is performed before predicting the 
value of one of the attributes of the first entity. 
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24. The method of claim 23 wherein, for each other one 
of said entities, the weight associated therewith is a normal- 
ized value of the degree to which the first entity and said 
each other one of the entities match. 

25. A computer-readable medium having computer 
executable instructions stored therein, said instructions 
being executed by a computer for performing the steps of 
claim 15. 

26. Computer-implemented apparatus for providing an 
indication that specifies a degree to which (i) a first entity 
having a first set of attributes and (ii) a second entity having 
a second set of attributes match each other, each of said sets 
of attributes reflecting predefined characteristics of an asso- 
ciated one of the entities, the apparatus comprising: 

a processor; and 

a storage device, connected to the processor and storing 
both computer executable instructions and a data struc- 
ture therein, the data structure storing both a corre- 
sponding parameter defining each of said entities and a 
set of attribute parameters for a corresponding set of the 
attributes associated with said each entity; 

wherein the processor, in response to execution of the 
instructions implements a collaborative filter match 
engine which: 

a) manipulates, through interaction with the storage 
device, at least a portion of the data structure stored 
within the storage device to form an expanded set of 
attributes by extending at least one of (i) the first set 
of attributes to form a first extended set of attributes 
which includes attributes found in neither the first 
nor second sets of attributes, and (ii) the second set 
of attributes to form a second extended set of 
attributes which includes attributes found in neither 
the first nor second sets of attributes; 

b) determines, in response to attribute parameters asso- 
ciated with the first and second entities in the 
expanded set of attributes stored within the data 
structure, a degree to which the first and second 
entities, at least one of which having a corresponding 
extended set of attributes, match; and 

c) generates the indication, as output, that specifies the 
degree of match. 

27. The apparatus of claim 26 wherein both (i) the first set 
of attributes is extended to form the first extended set of 
attributes which includes attributes found in neither the first 
set of attributes nor the second set of attributes, and (ii) the 
second set of attributes is extended to form the second 
extended set of attributes which includes attributes found in 
neither the first set of attributes nor the second set of 
attributes. 

28. The apparatus of claim 27 wherein the first and second 
extended sets of attributes are collectively defined as a union 
of (i) the first set of attributes, (ii) the second set of attributes, 
and (iii) attributes found in neither the lirst set of attributes 
nor the second set of attributes. 

29. The apparatus of claim 26 wherein at least some of the 
attribute parameters associated with the first set of attributes 
have associated values and at least some of the attribute 
parameters associated with the second set of attributes have 
associated values; and 

wherein the processor, in response to the stored 
instructions, assigns a default value to any of the 
attribute parameters not having an associated value, 
before determining the degree to which the first and 
second entities match. 

30. The apparatus of claim 26 wherein each of the first and 
second entities are selected from a group consisting of (A) 
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a computer, B) a computer user, (C) a local area computer 
network, D) an association, E) a network browser, and F) an 
internetwork browser. 

31. The apparatus of claim 30 wherein at least one of the 
attributes identifies content which may be requested by the 5 
first and second entities. 

32. Computer-implemented apparatus for predicting a 
value of an attribute of an entity, having a first set of 
attributes, based on values of attributes associated with a 
group of entities, each of the entities of the group has a 10 
corresponding set of said attributes which reflect predefined 
characteristics thereof, the apparatus comprising: 

a processor; and 

a storage device, connected to the processor and storing 
both computer executable instructions and a data struc- 15 
ture therein, the data structure storing, for each one of 
said entities, both a corresponding parameter defining 
said each one entity and attribute values associated with 
said each one entity; 

wherein the processor, in response to execution of the 
instructions: 

a) manipulates, through interaction with the storage 
device, at least a portion of the data structure stored 
within the storage device to form an expanded set of 2$ 
attributes by extending the first set of attributes to 
include attributes found in neither the first set of 
attributes nor associated with at least one other entity 

in the group of entities; 

b) determines, for each one of the entities in the group 
other than the first entity and in response to values of 
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the attributes, in the expanded set, associated with 
the first entity and said each one of the other entities, 
a degree to which the first entity matches said each 
one of the other entities; and 
c) predicts the value of one of the attributes associated 
with the first entity based on a sum, taken over the 
attribute values associated with all of said entities in 
the group, of a function of (i) a value of each of the 
attributes associated with each one of the other 
entities, and (ii) a degree to which the first entity 
matches said each one of the other entities. 

33. The apparatus of claim 32 wherein the function is a 
product. 

34. The apparatus of claim 32 wherein the expanded set 
of attributes is defined by a union of (i) the first set of 
attributes for the first entity and (ii) the set of attributes of 
every other one of the entities of the group and (iii) attributes 
found in neither the first set of attributes not the set of 
attributes of every other one of the entities of the group of 
entities. 

35. The apparatus of claim 32 wherein the processor, in 
response to execution of the instructions: 

assigns a default value to any of the attributes not having 
an associated value, before determining the degree to 
which the first entity and said each one of the other 
entities match. 
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